6 results on '"Shafaei-Bajestan E"'
Search Results
2. LDL-AURIS: A computational model, grounded in error-driven learning, for the comprehension of single spoken words
- Author
-
Baayen Rh, Uhrig P, Moradipour-Tari M, and Shafaei-Bajestan E
- Subjects
Error-driven learning ,Text mining ,Spoken word recognition ,Computer science ,business.industry ,Speech recognition ,business - Abstract
A computational model for auditory word recognition is presented that enhances the model of Arnold et al. (2017). Real-valued features are extracted from the speech signal instead of discrete features. One-hot encoding for words’ meanings is replaced by real-valued semantic vectors, adding a small amount of noise to safeguard discriminability. Instead of learning with Rescorla-Wagner updating, we use multivariate multiple regression, which captures discrimination learning at the limit of experience. These new design features substantially improve prediction accuracy for words extracted from spontaneous conversations. They also provide enhanced temporal granularity, enabling the modeling of cohort-like effects. Clustering with t-SNE shows that the acoustic form space captures phone-like similarities and differences. Thus, wide learning with high-dimensional vectors and no hidden layers, and no abstract mediating phone-like representations is not only possible but achieves excellent performance that approximates the lower bound of human accuracy on the challenging task of isolated word recognition.
- Published
- 2020
3. Wide Learning for Auditory Comprehension
- Author
-
Shafaei-Bajestan, E. and Baayen, R. H.
- Abstract
Classical linguistic, cognitive and engineering models for speech recognition and human auditory comprehension posit representations for sounds and words that mediate between the acoustic signal and interpretation. Recent advances in automatic speech recognition have shown, using deep learning, that state-of-the-art performance is obtained without such units. We present a cognitive model of auditory comprehension based on wide rather than deep learning that was trained on 20 to 80 hours of TV news broadcasts. Just as deep network models, our model is an end-to-end system that does not make use of phonemes and phonological wordform representations. Nevertheless, it performs well on the difficult task of single word identification (model accuracy 11.37%, Mozilla DeepSpeech: 4.45%). The architecture of the model is a simple two-layered wide neural network with weighted connections between the acoustic frequency band features as inputs and lexical outcomes (pointers to semantic vectors) as outputs. Model performance shows hardly any degredation when trained on speech in noise rather than on clean speech. Performance was further enhanced by adding a second network to a standard wide network. The present word recognition module is designed to become part of a larger system modeling the comprehension of running speech.
- Published
- 2018
4. Language with vision: A study on grounded word and sentence embeddings.
- Author
-
Shahmohammadi H, Heitmeier M, Shafaei-Bajestan E, Lensch HPA, and Baayen RH
- Subjects
- Humans, Machine Learning, Visual Perception physiology, Language, Natural Language Processing
- Abstract
Grounding language in vision is an active field of research seeking to construct cognitively plausible word and sentence representations by incorporating perceptual knowledge from vision into text-based representations. Despite many attempts at language grounding, achieving an optimal equilibrium between textual representations of the language and our embodied experiences remains an open field. Some common concerns are the following. Is visual grounding advantageous for abstract words, or is its effectiveness restricted to concrete words? What is the optimal way of bridging the gap between text and vision? To what extent is perceptual knowledge from images advantageous for acquiring high-quality embeddings? Leveraging the current advances in machine learning and natural language processing, the present study addresses these questions by proposing a simple yet very effective computational grounding model for pre-trained word embeddings. Our model effectively balances the interplay between language and vision by aligning textual embeddings with visual information while simultaneously preserving the distributional statistics that characterize word usage in text corpora. By applying a learned alignment, we are able to indirectly ground unseen words including abstract words. A series of evaluations on a range of behavioral datasets shows that visual grounding is beneficial not only for concrete words but also for abstract words, lending support to the indirect theory of abstract concepts. Moreover, our approach offers advantages for contextualized embeddings, such as those generated by BERT (Devlin et al, 2018), but only when trained on corpora of modest, cognitively plausible sizes. Code and grounded embeddings for English are available at ( https://github.com/Hazel1994/Visually_Grounded_Word_Embeddings_2 )., (© 2023. The Author(s).)
- Published
- 2024
- Full Text
- View/download PDF
5. The pluralization palette: unveiling semantic clusters in English nominal pluralization through distributional semantics.
- Author
-
Shafaei-Bajestan E, Moradipour-Tari M, Uhrig P, and Baayen RH
- Abstract
Using distributional semantics, we show that English nominal pluralization exhibits semantic clusters. For instance, the change in semantic space from singulars to plurals differs depending on whether a word denotes, e.g., a fruit, or an animal. Languages with extensive noun classes such as Swahili and Kiowa distinguish between these kind of words in their morphology. In English, even though not marked morphologically, plural semantics actually also varies by semantic class. A semantically informed method, CosClassAvg, is introduced that is compared to two other methods, one implementing a fixed shift from singular to plural, and one creating plural vectors from singular vectors using a linear mapping (FRACSS). Compared to FRACSS, CosClassAvg predicted plural vectors that were more similar to the corpus-extracted plural vectors in terms of vector length, but somewhat less similar in terms of orientation. Both FRACSS and CosClassAvg outperform the method using a fixed shift vector to create plural vectors, which does not do justice to the intricacies of English plural semantics. A computational modeling study revealed that the observed difference between the plural semantics generated by these three methods carries over to how well a computational model of the listener can understand previously unencountered plural forms. Among all methods, CosClassAvg provides a good balance for the trade-off between productivity (being able to understand novel plural forms) and faithfulness to corpus-extracted plural vectors (i.e., understanding the particulars of the meaning of a given plural form)., Supplementary Information: The online version contains supplementary material available at 10.1007/s11525-024-09428-9., Competing Interests: Competing InterestsNo potential conflict of interest was reported by the authors., (© The Author(s) 2024.)
- Published
- 2024
- Full Text
- View/download PDF
6. The processing of pseudoword form and meaning in production and comprehension: A computational modeling approach using linear discriminative learning.
- Author
-
Chuang YY, Vollmer ML, Shafaei-Bajestan E, Gahl S, Hendrix P, and Baayen RH
- Subjects
- Humans, Psycholinguistics, Semantics, Speech, Comprehension, Discrimination Learning
- Abstract
Pseudowords have long served as key tools in psycholinguistic investigations of the lexicon. A common assumption underlying the use of pseudowords is that they are devoid of meaning: Comparing words and pseudowords may then shed light on how meaningful linguistic elements are processed differently from meaningless sound strings. However, pseudowords may in fact carry meaning. On the basis of a computational model of lexical processing, linear discriminative learning (LDL Baayen et al., Complexity, 2019, 1-39, 2019), we compute numeric vectors representing the semantics of pseudowords. We demonstrate that quantitative measures gauging the semantic neighborhoods of pseudowords predict reaction times in the Massive Auditory Lexical Decision (MALD) database (Tucker et al., 2018). We also show that the model successfully predicts the acoustic durations of pseudowords. Importantly, model predictions hinge on the hypothesis that the mechanisms underlying speech production and comprehension interact. Thus, pseudowords emerge as an outstanding tool for gauging the resonance between production and comprehension. Many pseudowords in the MALD database contain inflectional suffixes. Unlike many contemporary models, LDL captures the semantic commonalities of forms sharing inflectional exponents without using the linguistic construct of morphemes. We discuss methodological and theoretical implications for models of lexical processing and morphological theory. The results of this study, complementing those on real words reported in Baayen et al., (Complexity, 2019, 1-39, 2019), thus provide further evidence for the usefulness of LDL both as a cognitive model of the mental lexicon, and as a tool for generating new quantitative measures that are predictive for human lexical processing.
- Published
- 2021
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.