9,785 results on '"Corpus"'
Search Results
2. Detection of Potentially Non-compliant Clauses in Online ToS in Portuguese
- Author
-
Tocchini, Matheus, Rocha, Igor M., de Barros, Raphael M., e Silva, Jéssica O., Garcia, Ananda F., Zular, Felipe, Maranhão, Juliano, Sichman, Jaime, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Santos, Manuel Filipe, editor, Machado, José, editor, Novais, Paulo, editor, Cortez, Paulo, editor, and Moreira, Pedro Miguel, editor
- Published
- 2025
- Full Text
- View/download PDF
3. Collocation of Preposition Terhadap in Indonesian Language: A Corpus-Based Analysis
- Author
-
Notonegoro, Raya Jayawati Ratnawilis Amanah, Suhardijanto, Totok, Budiman, Manneke, Series Editor, Budianta, Melani, Series Editor, Kusno, Abidin, Series Editor, Padawangi, Rita, Series Editor, Stroupe, Richmond, editor, and Roosman, Lilie, editor
- Published
- 2025
- Full Text
- View/download PDF
4. ESDC: An open Earth science data corpus to support geoscientific literature information extraction.
- Author
-
Li, Hao, Yue, Peng, Tapete, Deodato, Cigna, Francesca, Wu, Qiuju, Xiang, Longgang, and Lu, Binbin
- Subjects
- *
LANGUAGE models , *KNOWLEDGE graphs , *DATA mining , *CHATGPT , *DATA science , *QUESTION answering systems - Abstract
Over the past ten years, large amounts of original research data related to Earth system science have been made available at a rapidly increasing rate. Such growing data stock helps researchers understand the human-Earth system across different fields. A substantial amount of this data is published by geoscientists as open-access in authoritative journals. If the information stored in this literature is properly extracted, there is significant potential to build a domain knowledge base. However, this potential remains largely unfulfilled in geoscience, with one of the biggest obstacles being the lack of publicly available related corpora and baselines. To fill this gap, the Earth Science Data Corpus (ESDC), an academic text corpus of 600 abstracts, was built from the international journal Earth System Science Data (ESSD). To the best of our knowledge, ESDC is the first corpus with the needed detail to provide a professional training dataset for knowledge extraction and construction of domain-specific knowledge graphs from massive amounts of literature. The production process of ESDC incorporates both the contextual features of spatiotemporal entities and the linguistic characteristics of academic literature. Furthermore, annotation guidelines and procedures tailored for Earth science data are formulated to ensure reliability. ChatGPT with zero- and few-shot prompting, BARTNER generative, and W2NER discriminative models were trained on ESDC to evaluate the performance of the name entity recognition task and showed increasing performance metrics, with the highest achieved by BARTNER. Performance metrics for various entity types output by each model were also assessed. We utilized the trained BARTNER model to perform model inference on a larger unlabeled literature corpus, aiming to automatically extract a broader and richer set of entity information. Subsequently, the extracted entity information was mapped and associated with the Earth science data knowledge graph. Around this knowledge graph, this paper validates multiple downstream applications, including hot topic research analysis, scientometric analysis, and knowledge-enhanced large language model question-answering systems. These applications have demonstrated that the ESDC can provide scientists from different disciplines with information on Earth science data, help them better understand and obtain data, and promote further exploration in their respective professional fields. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
5. What ratings and corpus data reveal about the vividness of Mandarin ABB words.
- Author
-
Van Hoey, Thomas, Yu, Xiaoyu, Pan, Tung-Le, and Do, Youngah
- Abstract
A well-known method of studying iconic words is through the collection of subjective ratings. We collected such ratings regarding familiarity, iconicity, imagery/imageability, concreteness, sensory experience rating (SER), valence and arousal for Mandarin ABB words. This is a type of phrasal compound consisting of a prosaic syllable A and a reduplicated BB part, resulting in a vivid phrasal compound, for example, wù-mángmáng 雾茫茫 'completely foggy'. The correlations between the newly collected ABB ratings are contrasted with two other sets of prosaic word ratings, demonstrating that variables that characterize ABB words in an absolute sense may not play a distinctive role when contrasted with other types of words. Next, we provide another angle for looking at ABB words, by investigating to what degree rating data converges with corpus data. By far, the variable that characterizes ABB items consistently throughout these case studies is their high score for imageability, showing that they are indeed rightfully characterized as vivid. Methodologically, we show that it pays off to not take rating data at face value but to contrast it with other comparable datasets of a different phenomenon or data about the same phenomenon compiled in an ontologically different manner. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
6. China opportunity or China threat? A corpus-based study of China's image in Australian news discourse.
- Author
-
Huan, Changpeng
- Subjects
- *
PUBLIC opinion , *HAZARDS , *CORPORA , *DISCOURSE - Abstract
Australia has increasingly dependent on China for economic prosperity, but the formidable resources China has brought in its engagement with Australia have been discursively constructed as a threat. Past studies have observed a conventional and often oversimplified binary perception of China–Australia relations (i.e. China opportunity or China threat). Premised on corpus evidences concerning the representation of China in Australian media discourses from 2009 to 2019, this article has shown China threat is considerably more complex than previously observed. China was not only constructed as a security threat but also an economic threat. Australian media contemporary imaginaries of China threat were related to continued fears of "Yellow Peril" and "Red Peril", which spread from the US and reinforced Australia's tendency to dependency. There is a danger of China threat dominating public perceptions of China, drifting into populism of offering simplistic solutions to complex issues. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
7. Collocations of Pria, Lelaki, and Jantan as Representations of Masculinity in Indonesia.
- Author
-
Dari, Mika Wulan, Syahrani, Agus, and Asfar, Dedy Ari
- Subjects
HUMAN-animal relationships ,HUMAN-plant relationships ,COLLOCATION (Linguistics) ,DISCOURSE analysis ,GENDER inequality ,PUBLIC sphere ,MASCULINITY - Abstract
Language is one way to understand a society and its culture, including masculine norms. Exploring evolutionary masculinity through language is an intriguing concept to revisit. The research examines words synonymous with "men" in Indonesia and reviews their usage to depict current masculinity in the country. This research applied discourse analysis to corpora sourced from the Leipzig Corpora and CQPWeb. The data were analyzed using semantic preference to find meanings and semantic prosody to find connotations of pria, lelaki, and jantan. The findings reveal differences in the meanings and usage of the pria, lelaki, and jantan words. The difference in meaning is that pria is an adult male, whereas lelaki is a representation of men who are not limited in age, and jantan is interpreted as the genitals of animals or plants and men in the context of masculinity. According to usage, the word pria is frequently used in the public sphere, such as in the context of work and news discourse. Lelaki tends to be used more in the personal sphere, such as family, rather than in public settings. Jantan tends to be used in public discourse. The connotations of pria, lelaki, and jantan is neutral. This study successfully demonstrated the shift in Indonesian masculinity from traditional to new forms, indicating the impact of language studies on the analysis of masculinization in Indonesia. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
8. 基于字符增强的工业设备故障命名实体识别.
- Author
-
张阳 and 刘瑾
- Subjects
- *
LONG-term memory , *PLANT performance , *RANDOM fields , *INDUSTRIAL equipment , *INDUSTRIAL goods - Abstract
To address the issues of sparse training data, complex entity structures, and uneven entity distribution in the industrial equipment failure domain, this study constructs an industrial equipment failure named entity recognition corpus. Due to the difficulty of character level named entity recognition models in representing the professional vocabulary information in the field of industrial equipment failure, this study proposes a character enhanced industrial equipment failure named entity recognition model to address this problem. In the embedding layer, professional vocabulary information is directly fused between the Transformer layers of ROBERT WWM (Robustly Optimized BERT Pretraining Approach with Whole Word Masking) to allocate word information to each of its constituent characters for enhanced semantics. The global semantic information is obtained through a BiLSTM (Bidirectional Long Short Term Memory), and the CRF (Conditional Random Field) is used to learn the dependency relationship between adjacent labels to obtain the optimal sentence level label sequence. Experimental results demonstrate that the proposed model has good performance on industrial equipment fault named entity recognition tasks, with an average F1 score of 92.403%. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
9. Authenticity of academic lecture passages in high-stakes tests: A temporal fluency perspective.
- Author
-
Nishizawa, Hitoshi
- Subjects
- *
ACADEMIC achievement , *STAKEHOLDERS , *ENGLISH language , *ENGLISH as a foreign language , *LECTURES & lecturing - Abstract
Corpus-based studies have offered the domain definition inference for test developers. Yet, corpus-based studies on temporal fluency measures (e.g., speech rate) have been limited, especially in the context of academic lecture settings. This made it difficult for test developers to sample representative fluency features to create authentic listening passages. To address this issue, the Fluency Corpus of Academic English Lectures (FCAEL) was created to offer insight into the thresholds for temporal fluency features in academic lecture settings. The current study compared the corpus data to the academic lecture passages in the Test of English as a Foreign Language Internet-based test (TOEFL iBT) and International English Language Testing System (IELTS) to examine the domain definition inference of these tests. In total, 14 temporal fluency measures were examined. A bootstrapped one-way multivariate analysis of variance (MANOVA), followed by a series of bootstrapped analyses of variances (ANOVAs), independent t -test, and Tukey tests showed some support for the tests, although many limitations were also found. The study suggests the 25th–75th percentile of FCAEL as tentative thresholds for each temporal fluency feature. The proposal may be useful for test developers to create and revise test materials. Coding schemes, analysis codes, and raw corpus data are available on the project's Open Science Framework page, exemplifying how Open Science can provide benefits beyond the academic community. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
10. Estudio comparativo del estilo de traducción de las Analectas (Lunyu): análisis basado en el corpus paralelo de las versiones de Pérez Arroyo y Suárez Girard.
- Author
-
Zuo Ya and Li Biwei
- Abstract
Copyright of CIRCULO de Linguistica Aplicada a la Comunicacion is the property of Universidad Complutense de Madrid and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
- Published
- 2024
- Full Text
- View/download PDF
11. The Functional Features of Chunks in Journal Articles of Aquaculture.
- Author
-
Jiumei Xu and Fei Guo
- Subjects
RESEARCH ,AQUACULTURE ,CIVIL rights ,ACADEMIC discourse ,PERIODICAL articles - Abstract
This study investigates the functional similarities and differences of four-word chunks in the academic discourse of aquaculture by Chinese and international scholars based on Hyland's functional classification method within a corpus-driven approach. The findings reveal that, compared to their international counterparts, Chinese scholars significantly utilize more four-word chunks. Functionally, Chinese scholars frequently employ quantification, structure, framing, and engagement chunks, underscoring the importance they assign to the logic of discourse and the interaction between authors and readers. The infrequent use of description chunks suggests that it is essential for Chinese scholars to fully appreciate the significance of describing research objects, methods, and results in order to convey the foundational and experimental nature of hard science research. Furthermore, the structures of chunks used by Chinese and international scholars to express the same discourse functions differ. The expression of data indication among Chinese scholars appears more solidified. These research results can offer valuable references for academic writing. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
12. İlköğretim Ders Kitaplarının Söz Varlığı Bakımından İncelenmesi.
- Author
-
SAYIN, Hüseyin and DOĞAN, Yusuf
- Abstract
Copyright of Journal of Mother Tongue Education / Ana Dili Egitim Dergisi is the property of Journal of Mother Tongue Education / Ana Dili Egitim Dergisi and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
- Published
- 2024
- Full Text
- View/download PDF
13. The Role of Culture in Abusive Language on Social Media: Examining the Use of English and Arabic Derogatory Terms.
- Author
-
Alshalabi, Nahla, Lahiani, Hanene, and Yasin, Ayman
- Subjects
INVECTIVE ,ENGLISH language ,CULTURAL values ,SOCIAL norms ,DISCOURSE analysis ,TABOO - Abstract
Although several studies have dealt with the use of derogatory terms on social media, only few compared the phenomenon across languages from a sociocultural aspect. This study used a mixed-method comparative analysis of 920 Arabic and English abusive tweets. The researchers used content analysis to annotate the tweets according to their type and severity. They also used qualitative thematic discourse analysis to interpret the linguistic themes. Furthermore, they used frequency analysis to statistically identify the most common targets and lexical items and to identify the sociolinguistic patterns behind them. The results reveal that Arabic tweets have higher frequencies of gender abusive terms, and they are more severe than the English ones. However, English showed greater reliance on vulgar terms because of cultural taboos. English communication was also dominated by implicit insults, while Arabic favored explicit offense in accordance with direct/indirect cultural values. Both languages used emojis intensively, but Arabic used more diverse registers within messages. Anonymity boosted prejudices for both languages. In conclusion, the difference in online toxicity between the languages is the result of linguistic differences and the cultural norms and the interaction between the two. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
14. 冯友兰和华兹生《庄子》英语节译本的翻译风格对比研究 — 基于语料库统计与分析.
- Author
-
杜佩珏 and 刘性峰
- Abstract
Copyright of New Perspectives in Translation Studies is the property of New Perspectives in Translation Studies Editorial Office and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
- Published
- 2024
15. 政治语篇英译中人称代词的显化研究 — 以《习近平谈治国理政》的英译本为例.
- Author
-
王舒曼 and 张广法
- Abstract
Copyright of New Perspectives in Translation Studies is the property of New Perspectives in Translation Studies Editorial Office and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
- Published
- 2024
16. Corpus-based Analysis on Near-synonym Talk and Speak across Different Text Genres
- Author
-
Maharani Laksmi Anindita
- Subjects
collocation ,corpus ,near synonym ,speak ,talk ,Language and Literature - Abstract
ear synonyms are words in a certain language that have the same concept of meaning’ but cannot be used interchangeably in all contexts of use. Corpus data could be used to help understand the differences in the use of near synonymous words. This research aims to understand the distribution patterns of the near synonymous verbs ‘talk’ and ‘speak’ in different types of discourse as well as understand the collocation patterns of the verbs ‘talk’ and ‘speak’ in each discourse to understand the differences in the use of these two synonymous words. This research is a quantitative qualitative research and uses the British National Corpus (XML Edition) as the data source and CQPWeb as the analytical instrument. The analysis of the distribution and frequency of use of these verbs show that the verb ‘talk’ is more widely used than ‘speak’, but each verb has different grammatical form preferences in each type of discourse. In addition, the collocation pattern shows that the verb ‘talk’ plays more of a role as the main predicator, while the verb ‘speak’ is more often used as an adverbial phrase. These results prove that these two words cannot completely substitute for each other in every context. These results can also be used as a reference in teaching the use of synonymous words in learning English as a second language.
- Published
- 2024
- Full Text
- View/download PDF
17. Parallel corpus in analysing Czech spoken expressions and their equivalents in English, French, and Polish
- Author
-
Adrian Jan Zasina
- Subjects
corpus ,corpus-based exercises ,czech ,data-driven learning ,discourse markers ,speaking skills ,spoken expressions ,Philology. Linguistics ,P1-1091 - Abstract
This paper uses corpus data to analyse spoken expressions and discourse markers in Czech, applying these findings to corpus-based exercises for learners of Czech as a foreign language. The analytical section highlights the usefulness of parallel corpus in identifying suitable translation equivalents for prevalent Czech spoken vocabulary in English, French, and Polish as native languages from the learner’s perspective. The methodology outlines the process of finding appropriate translation equivalents in film subtitles, considering both meaning and spoken register. The pedagogical section introduces three corpus-based exercises designed to improve conversational skills, featuring authentic texts that familiarise learners with spoken vocabulary. This research builds on previous studies of the English language that did not use parallel corpora to identify translation equivalents in learners’ native languages — an essential factor for understanding a foreign language. In addition, tailor-made corpus-based exercises can be seamlessly integrated into everyday classroom activities to enhance language awareness among non-native speakers.
- Published
- 2024
- Full Text
- View/download PDF
18. What ratings and corpus data reveal about the vividness of Mandarin ABB words
- Author
-
Thomas Van Hoey, Xiaoyu Yu, Tung-Le Pan, and Youngah Do
- Subjects
ABB ,corpus ,ideophone ,iconicity ,imagery ,norms ,ratings ,vividness ,Language and Literature ,Consciousness. Cognition ,BF309-499 - Abstract
A well-known method of studying iconic words is through the collection of subjective ratings. We collected such ratings regarding familiarity, iconicity, imagery/imageability, concreteness, sensory experience rating (SER), valence and arousal for Mandarin ABB words. This is a type of phrasal compound consisting of a prosaic syllable A and a reduplicated BB part, resulting in a vivid phrasal compound, for example, wù-mángmáng 雾茫茫 ‘completely foggy’. The correlations between the newly collected ABB ratings are contrasted with two other sets of prosaic word ratings, demonstrating that variables that characterize ABB words in an absolute sense may not play a distinctive role when contrasted with other types of words. Next, we provide another angle for looking at ABB words, by investigating to what degree rating data converges with corpus data. By far, the variable that characterizes ABB items consistently throughout these case studies is their high score for imageability, showing that they are indeed rightfully characterized as vivid. Methodologically, we show that it pays off to not take rating data at face value but to contrast it with other comparable datasets of a different phenomenon or data about the same phenomenon compiled in an ontologically different manner.
- Published
- 2024
- Full Text
- View/download PDF
19. Le corpus en sciences du langage, un lieu de vérification des enjeux langagiers
- Author
-
Salem Ferhat
- Subjects
corpus ,sciences du langage ,choix et limites ,homogénéité et représentativité ,fiabilité des résultats ,Romanic languages ,PC1-5498 ,Language. Linguistic theory. Comparative grammar ,P101-410 - Abstract
L’objet de l’article aborde la question du corpus en sciences du langage. Il met en exergue les principaux aspects relatifs à la constitution d’un corpus bien réfléchi pour servir d’assise théorique. Cet exposé est une synthèse d’approches permettant aux chercheurs débutants de constituer, de délimiter et d’explorer certains aspects du langage afin de lancer une recherche scientifique fondée. Il montre que les résultats d’une recherche en sciences du langage devraient découler des données tangibles d’un corpus pour que soit formulée toute nouvelle théorie, ou pour qu’une théorie soit mise en cause, voire infirmée. La matière en termes de contenus fait appel aux avis des spécialistes de corpus en ce qui concerne la définition du rôle du corpus dans les études linguistiques, de son choix, de sa délimitation, de son homogénéité, de sa représentativité, de la nature de ses composants et de ses contraintes. Cet article montre au fur et à mesure les principales facettes à prendre en compte dans la constitution du corpus afin de permettre aux chercheurs de garantir la fiabilité des résultats.
- Published
- 2024
- Full Text
- View/download PDF
20. İlköğretim Ders Kitaplarının Söz Varlığı Bakımından İncelenmesi
- Author
-
Yusuf Doğan and Hüseyin Sayın
- Subjects
söz varlığı ,ilköğretim ders kitapları ,kelime sıklığı ,derlem ,kelime listesi ,lexicon ,primary school textbooks ,word frequency ,corpus ,word list ,Education - Abstract
Bu araştırmanın amacı, ilköğretim sürecinde zorunlu dersler kapsamında öğrenciler tarafından kullanılan ders kitaplarındaki söz varlığı ile ilgili bir durum analizi ortaya koymaktır. Nitel araştırma yönteminde durum çalışması deseninde yürütülen araştırma, 2018-2019 eğitim öğretim yılı örnekleminde 48 ders kitabı üzerinden gerçekleştirilmiş; doküman incelemesi yoluyla toplanan veriler, içerik analizleri ile değerlendirilmiştir. Ders kitaplarından elde edilen 1.058.434 biçimbirimsel (ham) söz varlığı, anlambirimsel olarak incelenmiş ve araştırmanın amacına uygun olarak literatürde daha önceden belirlenen ölçütler doğrultusunda işlenmiştir. Araştırma sonucunda öğrencilerin zorunlu dersler kapsamında işlenen ders kitaplarında toplam olarak 33.021’İ birbirinden farklı olmak üzere 949.355 söz varlığı unsuruyla karşılaştığı tespit edilmiştir. Bu unsurlardan 864.345’i (20.798 çeşit) temel söz varlığı, 170’i (113 çeşit) atasözü, 16.252’si (2.247 çeşit) deyim, 2.793’ü (636 çeşit) ikileme, 968’i (197 çeşit) kalıp söz, 43.792’si (6.110 çeşit) özel isim, 20.623’ü (2.733 çeşit) sayısal ifade, 412’si (187 çeşit) yabancı dillere ait söz varlığı unsuru olarak belirlenmiştir. Ders kitaplarında yalnızca bir kez kullanılmış olan 11.891 söz varlığı unsuru, ders kitaplarında geçen söz varlığı toplamının yaklaşık olarak %1,25’ini; birden çok tekrar eden 21.129 söz varlığı unsuru (toplamda 937.464 sıklık) ise ders kitaplarında geçen söz varlığı toplamının %98,75’ini oluşturmuştur. Ders kitaplarında en sık geçen ilk 2.002 söz varlığı unsuru ise çalışma ekinde paylaşılmıştır.
- Published
- 2024
- Full Text
- View/download PDF
21. Pragmatic markers in contemporary poetry: A corpus-based discourse analysis
- Author
-
Olga V. Sokolova and Vladimir V. Feshchenko
- Subjects
pragmatics ,poetic discourse ,colloquial speech ,discourse markers ,corpus ,Philology. Linguistics ,P1-1091 - Abstract
Poetic discourse, which engages the poetic function of language as a constitutive one, transforms the postulates of pragmatics of ordinary language. New poetic practices often represent a kind of pragmatic experiment: the effect of linguopragmatic parameters inherent in conventional communication is tested here on the borderline between the norm and the anomaly. The aim of this study is to identify the specific functionality of pragmatic markers in the condition of increased permeability between discourses and to explore the features of trans-discourse interaction of poetic language and colloquial speech in new media. The study is based on a corpus of poetic texts (3 million words), including Russian, English, and Italian subcorpora. It identified new communicative strategies of addressing and clusters of deictic, modal and discourse markers, grouped according to Jakobson’s communicative model (Jakobson 1975). The study identified qualitative differences between the frequency of use of several units in poetic discourse and in colloquial speech. We considered various pragmatic strategies, referring not only to individual units, but also clusters of deictic, modal, and discourse words, etc. We found that Italian and Russian poetry uses discourse markers more often than American poetry. Differences in linguistic structure also affect the specifics of a pragmatic experiment. Thus, in American poetry, a pragmatic experiment often activates the syntactic level; in Russian poetry, experiments with word formation and modality are more frequent; in Italian poetry, the pragmatic experiment is often combined with the structural-syntactic one: pragmatic markers form “clusters” or “chains”, when an increase in the density of use of units leads to an increase in the range of deviations from standard usage. The research based on the poetic corpus of texts contributes to the study of poetic discourse and corpus pragmatics.
- Published
- 2024
- Full Text
- View/download PDF
22. Indonesian Women in Kamus Besar Bahasa Indonesia (KBBI) (1988–2018): A Lexicographic Corpus
- Author
-
Ria Febrina, Suhandano, and Adi Sutrisno
- Subjects
women ,indonesian ,corpus ,lexicography ,kbbi ,Philology. Linguistics ,P1-1091 - Abstract
Indonesian women have undergone significant changes over time, as reflected in the vocabulary of the Kamus Besar Bahasa Indonesia (KBBI). This study aims to describe the representation of Indonesian women in the KBBI and to explain the development of their social and cultural lives over 30 years (1988–2018). The research employs a descriptive-qualitative approach by collecting data through the extraction of entries, definitions, compound words, and proverbs containing the terms “perempuan” (woman) and “wanita” (lady) from two printed editions of the KBBI: the first edition (1988) and the fifth edition (2018). Data analysis was conducted using Sketch Engine to analyze 1,381,578 tokens, and the findings revealed 1,148 collocations and concordances related to the terms “perempuan” and “wanita.” The results indicate that the study of Indonesian women within a linguistic corpus offers insights into their contributions over 30 years across various fields such as religion, military, economy, journalism, health, politics, arts and culture, and beauty. Through corpus-lexicography studies, the portrayal of Indonesian women in the dictionary has challenged patriarchal views that traditionally positioned women as inferior to men. This research highlights the importance of recognizing the representation of women in the social dynamics of Indonesian society. It offers a significant contribution to the broader field of Indonesia lexicography studies by examining how women are represented in dictionary entries.
- Published
- 2024
- Full Text
- View/download PDF
23. PANDANGAN TENTANG PRAKTIK BEDAH PLASTIK DALAM ARTIKEL OPINI DI SURAT KABAR KOREA TAHUN 2010–2019
- Author
-
Putu Pramania Adnyana, Abdul Muta’ali, and Sonya Puspasari Suganda
- Subjects
plastic surgery ,korea ,collocation ,semantic preference ,corpus ,Ethnology. Social and cultural anthropology ,GN301-674 ,Philosophy. Psychology. Religion - Abstract
This research aims to reveal Korean’s perspectives on plastic surgery practices through a corpus-based text analysis of Korean online newspapers. The research used quantitative and qualitative methods to observe keywords, significant collocates, and semantic preference. The data used were 106 opinion articles about plastic surgery in Chosun Ilbo and Donga Ilbo (2010–2019). Keyword analysis shows that the most frequently used keywords are seonghyeong (189 times) and seonghyeongsusul (116 times). Analysis of significant collocates and semantic preferences revealed that plastic surgery was frequently discussed in relation to body parts, people (collocate yeoseong (women), junggukin (Chinese), hwanja (patient), gwangganggek (tourist)), country (collocate jungguk (China), hangguk (Korea)), intensity (gwadohan (excessive)), and effect (phihae (loss)), and others. Plastic surgery clinics are frequently addressed in terms of location (collocate Gangnam, Seoul), intensity (collocate daehyong (big scale)), and others. Based on the findings of this study, plastic surgery is primarily performed on local women and international patients, particularly Chinese tourists. Plastic surgery is also perceived as having a negative impact, being overdone, and causing numerous difficulties. Plastic surgery performed on foreign patients is considered as beneficial to Korean economic progress, but it should not be performed on Koreans.
- Published
- 2024
- Full Text
- View/download PDF
24. Zu Adjektivkomposita in der heutigen Sprache der Medizin auf der Grundlage der Fachzeitschrift „Deutsches Ärzteblatt'. Ein sprachlicher Schnappschuss
- Author
-
Anna Dargiewicz and Maciej Choromański
- Subjects
special language ,medical language ,word formation ,adjective compounds ,corpus ,Philology. Linguistics ,P1-1091 ,German literature ,PT1-4897 - Abstract
The linguistics of special languages, which deserves a great deal of attention, makes it possible to break down the various special languages almost to the ground. The position of special languages is undoubtedly relevant at the present time. Transferring specialized knowledge, special languages make it possible to define, among other things, various phenomena typical of a particular field. With the permanent development of sciences, special languages are also evolving. One of the special languages that is developing with tremendous speed is the language of medicine which is analyzed here. One particularly significant aspect of the study of special languages is word formation. The purpose of this article is to present adjectival compounding in German medical language. The theoretical part, first and foremost, presents the informative sketch of special languages in general, of medical language itself and of adjective composition as a type of word formation. In the empirical part, based on the corpus incorporating issues of the professional journal “Deutsches Ärzteblatt” it was investigated (using a qualitative and quantitative method), what a-constituents have the adjective compounds, how many components the adjective compounds retrieved there consist of, how they are written and what linking morphemes participate in their formation. The corpus analysis carried out in the article illuminates the direction of orientation in the development of the German language of medicine in the area of word formation – here with regard to adjective composition – and provides important information on the structure, length and spelling of the extracted adjective compounds.
- Published
- 2024
- Full Text
- View/download PDF
25. A corpus-based study on the "ungrammatical" aren't I.
- Author
-
Xiang, Mingyou and Jiang, Xiao
- Abstract
Concerning the "ungrammatical" interrogative form aren't I , many scholars have made their points. However, these scholars' arguments are based on their personal observations and few studies have examined this phenomenon against large corpora. This study aimed at investigating the widespread usage of "ungrammatical" contraction form aren't I in question tags from both quantitative and qualitative perspectives. Based on large corpora, this study showed a clear picture of the current frequency of use of the question tags aren't I and other alternatives (amn't I , ain't I , am I not and an't I) in modern English. From a qualitative perspective, this study found that the reason why aren't I has taken hold as a recognized standard form around the globe lies in that the use of aren't I appears to be a smart coincidence to imply the potential double roles of "I" as both the addresser and the addressee in a monologue. In addition, the fact of the matter that amn't I is difficult to pronounce, am I not is bookish, an't I is old-fashioned and ain't I can only be used in informal situations, increases the popularity of aren't I. The findings of this study can justify the usage of "ungrammatical" aren't I as a natural norm in both British English and American English. These findings open new research avenues alongside pedagogical and sociolinguistic implications for other similar "ungrammatical" language phenomenon. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
26. The C-ORAL-ESQ project: a corpus for the study of spontaneous speech of individuals with schizophrenia.
- Author
-
Raso, Tommaso, de Melo Rocha, Bruno Neves Rati, Salgado, João Vinícius, Cruz, Breno Fiuza, Mantovani, Lucas Machado, and Mello, Heliana
- Subjects
- *
PEOPLE with schizophrenia , *SPEECH , *FILES (Records) , *SOUND recordings , *PROSODIC analysis (Linguistics) - Abstract
This paper presents the C-ORAL-ESQ corpus project, which is dedicated to the study of the speech of individuals with schizophrenia. The main aim of the project is to investigate cognitive aspects of individuals with schizophrenia. This investigation is carried through the compilation of a spontaneous speech corpus and its study, which focuses mainly on the analysis of information structuring and its prosodic correlates. The paper mainly deals with the methodological aspects of the corpus compilation and reports its present stage: it informs about the environment and the setting of the sound file recordings, the medical and ethical criteria for the selection of the participants, the corpus aimed dimensions and the present stage of compilation, as well as its design and compilation criteria, which include attention to prosodic annotation, and metadata related to the participants' characteristics. Additionally, the theory adopted for the study of information structure is summarized, focusing on those aspects that can better address cognitive processes of individuals with schizophrenia and their prosodic correlates. Finally, the perspectives for future studies and resource compilations are presented. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
27. The conceptual nature of the Turkish emotion term 'Heyecan'.
- Abstract
This study aims at investigating the Turkish emotion concept heyecan (i.e. thrill, excitement, and nervousness), which can be used with different semantic contents depending on the context. The conceptual metaphor theory frames this analysis to reveal the metaphorical and metonymical conceptualizations of heyecan. For this purpose, the lemma heyecan is searched in the Turkish National Corpus, and 700 concordance lines gathered from the corpus are examined through the metaphor identification procedure to identify the source domains and interpret the conceptual coding. The findings reveal a folk model of heyecan in which several metaphors and metonymies characterize different dimensions of it: arousal–existence–disappearance, intensity–passivity, control, cause–effect, and individual–social. Qualitative and quantitative findings embody various linguistic metaphors that can be grouped under several source domain categories including substance in a container , location , and object as the most frequent ones, whereas physiological effect is the most frequent metonymy. The metaphors and metonymies are discussed with their examples in this study. The concordance lines show several emotion terms that heyecan is collocated with, among which the emotion families of 'fear' and 'happiness' outnumber the rest. This study demonstrates how corpus data are helpful in pinpointing the conceptual content of an emotion term in a coherent way. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
28. CatCoLA, Catalan Corpus of Linguistic Acceptability.
- Author
-
Bel, Núria, Punsola, Marta, and Ruiz-Fernández, Valle
- Subjects
LANGUAGE models ,TRANSLATING & interpreting ,CATALAN language ,CORPORA ,LINGUISTIC models - Abstract
Copyright of Procesamiento del Lenguaje Natural is the property of Sociedad Espanola para el Procesamiento del Lenguaje Natural and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
- Published
- 2024
- Full Text
- View/download PDF
29. Cross-Disciplinary Analysis of the Syntactic and Lexical Features of Chinese Master Thesis Titles.
- Author
-
Zhijie Wang, Abdul Jabar, Mohd Azidan, and Mohd Jalis, Farhana Muslim
- Subjects
SCHOLARLY communication ,UNIVERSITY rankings ,FOCUS (Linguistics) ,NOMINALS (Grammar) ,LINGUISTICS - Abstract
This study offered a detailed cross-disciplinary analysis of master thesis titles (MTTs) in the fields of Linguistics and Literature, focusing on the variations in title length, syntactic structure, and lexical features. Utilizing a corpus-based approach, the research analyzed 1,000 MTTs from 25 top universities in China, employing both quantitative and qualitative methods to explore how titles reflect disciplinary conventions. The quantitative analysis revealed that Linguistics titles were typically longer and utilized complex nominal structures with a higher lexical density of substantive words, emphasizing precision and detailed content communication. In contrast, Literature titles demonstrated greater syntactic diversity and lexical variety, reflecting a broader thematic scope and adaptability in narrative and thematic expressions. Qualitatively, the study highlighted how these features aligned with the distinct cultural and academic settings of each field. The findings suggested that while Linguistics titles focused on analytical depth, Literature titles incorporated more creative and interpretative elements. This research provided valuable insights into the construction of thesis titles and suggested practical applications for enhancing academic communication across disciplines. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
30. A comparative analysis of Spanish Clinical encoder-based models on NER and classification tasks.
- Author
-
Subies, Guillem García, Jiménez, Álvaro Barbero, and Fernández, Paloma Martínez
- Abstract
Objectives This comparative analysis aims to assess the efficacy of encoder Language Models for clinical tasks in the Spanish language. The primary goal is to identify the most effective resources within this context Importance This study highlights a critical gap in NLP resources for the Spanish language, particularly in the clinical sector. Given the vast number of Spanish speakers globally and the increasing reliance on electronic health records, developing effective Spanish language models is crucial for both clinical research and healthcare delivery. Our work underscores the urgent need for specialized encoder models in Spanish that can handle clinical data with high accuracy, thus paving the way for advancements in healthcare services and biomedical research for Spanish-speaking populations. Materials and Methods We examined 17 distinct corpora with a focus on clinical tasks. Our evaluation centered on Spanish Language Models and Spanish Clinical Language models (both encoder-based). To ascertain performance, we meticulously benchmarked these models across a curated subset of the corpora. This extensive study involved fine-tuning over 3000 models. Results Our analysis revealed that the best models are not clinical models, but general-purpose models. Also, the biggest models are not always the best ones. The best-performing model, RigoBERTa 2, obtained an average F1 score of 0.880 across all tasks. Discussion Our study demonstrates the advantages of dedicated encoder-based Spanish Clinical Language models over generative models. However, the scarcity of diverse corpora, mostly focused on NER tasks, underscores the need for further research. The limited availability of high-performing models emphasizes the urgency for development in this area. Conclusion Through systematic evaluation, we identified the current landscape of encoder Language Models for clinical tasks in the Spanish language. While challenges remain, the availability of curated corpora and models offers a foundation for advancing Spanish Clinical Language models. Future efforts in refining these models are essential to elevate their effectiveness in clinical NLP. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
31. 国际中文教育专硕生利用语料库进行学位论文写作的调查与 分析.
- Author
-
王昌宇 and 刘运同
- Abstract
Copyright of International Journal of Chinese Language Teaching is the property of Clifford Publishing and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
- Published
- 2024
- Full Text
- View/download PDF
32. 基于对比的中高级水平汉语二语学习者笔语词汇书面语 特征的多维度考察.
- Author
-
张江丽
- Abstract
Copyright of International Journal of Chinese Language Teaching is the property of Clifford Publishing and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
- Published
- 2024
- Full Text
- View/download PDF
33. Stance detection in Arabic with a multi-dialectal cross-domain stance corpus.
- Author
-
Charfi, Anis, Bessghaier, Mabrouka, Atalla, Andria, Akasheh, Raghda, Al-Emadi, Sara, and Zaghouani, Wajdi
- Abstract
We present a cross-domain and multi-dialectal stance corpus for Arabic, covering the major dialect groups and four Arab regions. This research provides an important language resource for automating the task of stance detection in Dialectal Arabic while carefully considering the subtle differences in stance expression across various dialects. More than 4500 sentences in our corpus have been carefully annotated according to their stance with regard to a certain subject. We gathered sentences associated with two controversial topics for every region and we had at least two annotators annotate each sentence to indicate if the author is supporting, opposing, or neutral to the sentence's topic. Our corpus shows high balance between dialect and stance. About half of the sentences in each region are written in Modern Standard Arabic, while the other half are written in the specific dialect of that region. To evaluate our corpus, we performed a number of machine-learning experiments for the stance detection task. The best performance was achieved by AraBERT with an accuracy and an F1-score of 0.82. Furthermore, we trained and tested this model on the most similar state-of-the-art stance dataset, "MAWQIF". The comparison results demonstrate how crucial it is to maintain balance among the three stance classes in our dataset. In particular, the model scored better when using our stance corpus than when using the MAWQIF dataset especially for the "Neutral" stance class. Using our best performing model, we developed a Web-based demonstrator for stance detection in dialectal Arabic and we show its effectiveness in analyzing stance in the context of two real-world scenarios: product boycott in the Arab world and customer reviews of a soft drink company. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
34. 日本 “一带一路” 新闻话语的批评隐喻分析.
- Author
-
孙成志 and 晋鑫哲
- Abstract
Copyright of Journal of Beijing International Studies University is the property of Beijing International Studies University and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
- Published
- 2024
- Full Text
- View/download PDF
35. 基于子音节表征的苗语语音合成方法.
- Author
-
蔡姗, 王林, 谭棉, 郭胜, 吴磊, and 王飞
- Abstract
Speech synthesis of minority languages contributes to the preservation, protection and development of national culture, while the research results in this field are currently limited. To address the problem of speech synthesis errors where words with different tones sound similar, a sub-syllable representation-based text-to-speech method for the Hmong language was proposed. The method utilized sub-syllables as training primitives to accurately represent the pronunciation information of the Hmong language, enabling distinctive learning of similar sounds across different syllables. According to the monotonicity of alignment between text sequence and Mel-spectrogram, a monotonic alignment loss was introduced to guide the attention module to learn alignment more accurately, thereby reducing synthesis phenomena such as word skipping and repetition inherent in the autoregressive attention mechanism. To verify the effectiveness of the proposed method, a self-built Hmong language speech synthesis corpus, HmongSpeech ( download link: http: / / sxjxsf. gzmu. edu. cn / info / 1728 / 1214. htm), was utilized as the benchmark dataset. Comparative experiments were conducted with typical speech synthesis methods. The experimental results show that the proposed method successfully reduces the synthetic error rate caused by the similar pronunciation of words with different tones. Notably, the word error rate is only 0. 96%, outperforming the baseline method by 6. 25% . [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
36. Improving completeness and consistency of co-reference annotation standard.
- Author
-
Xu, Yang, Farha, Fadi, Wan, Yueliang, Xu, Jiabo, Liu, Hong, and Ning, Huansheng
- Subjects
- *
ANNOTATIONS , *NATURAL language processing , *CLOUD storage , *DATA warehousing , *COMPUTER performance , *INTELLIGENT personal assistants - Abstract
As the processing power of mobile terminals increases, wireless network applications such as voice assistants can put more context-sensitive tasks on the mobile terminals, thus reducing the wireless network bandwidth needed and the cost of data storage in the cloud. Co-reference annotation, identifying the same semantics in context, is one of the critical techniques in these tasks. However, there are some problems with the existing co-reference annotation standards. First, the annotation is incomplete. Second, the types of annotated mentions are inconsistent. Third, there are currently no metrics for the above characteristics. Analyzing the above-mentioned issues, this paper proposes a new co-reference annotation standard. The new standard can annotate more semantics and co-reference relations and only adopts two types of mentions for annotation. Meanwhile, this paper presents a performance evaluation corpus and designs three performance metrics for evaluating the new standard according to the completeness of semantic annotation, the completeness of co-reference annotation, and the consistency of mention. The experiment shows that the new standard outperforms all the baseline methods and achieves 0.95 in the completeness of semantic annotation, 0.68 in the completeness of co-reference annotation, and 0.57 in the consistency of types of mentions. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
37. FASSILA: A Corpus for Algerian Dialect Fake News Detection and Sentiment Analysis.
- Author
-
Abdedaiem, Amin, Dahou, Abdelhalim Hafedh, Cheragui, Mohamed Amine, and Mathiak, Brigitte
- Subjects
MACHINE learning ,SENTIMENT analysis ,DEEP learning ,FAKE news ,DIALECTS - Abstract
In the context of low-resource languages, the Algerian dialect (AD) faces challenges due to the absence of annotated corpora, hindering its effective processing, notably in Machine Learning (ML) applications reliant on corpora for training and assessment. This study outlines the development process of a specialized corpus for Fake News (FN) detection and sentiment analysis (SA) in AD called FASSILA. This corpus comprises 10,087 sentences, encompassing over 19,497 unique words in AD, addresses the language's significant lack of linguistic resources, and covers seven distinct domains. We propose an FN detection and SA annotation scheme detailing the data collection, cleaning, and labeling. The remarkable Inter-Annotator Agreement indicates that the annotation scheme produces high-quality and consistent annotations. Subsequent classification experiments using BERT-based and ML models are presented, demonstrating promising results and highlighting avenues for further research. The dataset is currently freely available to facilitate future advancements in the field. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
38. Oxymoron: An Automatic Detection from the Corpus.
- Author
-
Senapati, Apurbalal
- Subjects
LANGUAGE models ,NATURAL language processing ,BENGALI language ,CORPORA ,COMPUTATIONAL linguistics - Abstract
An oxymoron is a linguistic phenomenon in which a pair of opposite or antonymous words are combined to convey a new meaning. Sometimes, it is used to express figurative, irony, or rhetoric within the text. This issue has received relatively less attention in the realms of linguistics and computational disciplines. Oxymorons play a significant role in various language-processing applications. This study represents a pioneering effort in the exploration of oxymorons in the Bengali language. A corpus-based study of oxymoron is a fundamental issue that has not been explored so far. A system has been proposed for the automated recognition of oxymorons from a given corpus. Frequency analysis, semantic similarity, and an antonym dictionary have been employed to discern oxymorons within the corpus. The system achieved promising results when tested on a Bengali corpus, and found 308 distinct oxymorons. A corpus-based descriptive statistics is measured in two different corpora. The most common oxymorons are ranked based on their frequency. Their notable presence underscores the importance of the Bengali language. This study aimed to explore fundamental questions concerning oxymorons, such as the automated detection of oxymorons within a corpus, descriptive statistics regarding oxymorons across languages, and the process of their construction and creation. Additionally, efforts were made to extract oxymorons from large language models using zero-shot prompts, but the results were not as promising compared to our proposed system. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
39. A Pilot Study of Neologisms in Kuwaiti English.
- Author
-
Alenezi, Mohammad Abdulaziz and Al-Qenaie, Shamlan Dawood
- Published
- 2024
- Full Text
- View/download PDF
40. مصادر الغموض النحوي في اللغة العربية المعاصرة : دراسة مدونة.
- Author
-
دريسي عثمان عبد ا, حسين محمد ياغي, and مجدي شاكر الصوال
- Abstract
Copyright of Jordanian Educational Journal is the property of Association of Arab Universities and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
- Published
- 2024
- Full Text
- View/download PDF
41. Do They Write Differently? Exploring Gendered Linguistic Differences in Academic Writings of Saudi Writers.
- Author
-
Ali, Sadia and Abdulhaleem, Ebtesam
- Subjects
GENDER differences (Sociology) ,ACADEMIC discourse ,MALE friendship ,MALE authors - Abstract
This study examines linguistic differences between male and female academic writing in Saudi Arabia, focusing on published research papers. Using Biber's multidimensional analysis as a model, the study examines both male and female authors' inherent lexical and grammatical preferences. A dataset of 20 research papers from each gender was tagged to analyze the linguistic features. ANOVA analyses were then conducted to identify patterns and variations. The research study provides interesting perspectives on the complex relationship between language and gender in academic settings. Though there are some similarities in the use of lexico-grammatical features between male and female research papers, noticeable differences suggest that gendered perspectives have an impact on scholarly writing. Both male and female research papers fall on the same polarity of the continuum across all five dimensions but with varying degrees. The findings suggest that male research writers tend to use more informational, explicit, and non-argumentative language while using less non-narrative and abstract discourse than their female counterparts. This study emphasizes how gender impacts the linguistic choices. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
42. 作为 “容器” 的网红——基于语料库的“网红”概念史研究.
- Author
-
赵珞琳 and 毛婉怡
- Abstract
Copyright of Publishing Journal is the property of Wuhan University, School of Information Science and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
- Published
- 2024
43. 语料库驱动的通用汉语学术词表构建.
- Author
-
高 松, 钱 隆, and 丁 芊
- Abstract
Copyright of China Terminology is the property of China Terminology Magazine and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
- Published
- 2024
- Full Text
- View/download PDF
44. A Corpus-based Study on Acquisition Mode of Continuation Task's Collocational Constructions of Circumstantial Shell Nouns.
- Author
-
Yu Qiang and Geraldine S. Wakat
- Subjects
COLLOCATION (Linguistics) ,ENGLISH fiction ,ENGLISH language ,LANGUAGE acquisition ,NOUNS - Abstract
This research is built upon a custom developed continuation task mini corpus, incorporating New Concept English and English novel corpora, the native-speaker LOCNESS corpus, and the COCA corpus. Utilizing Contrastive Interlanguage Approach, it thoroughly investigates the collocational constructions frequency, proportion, and Mutual Information (MI) values of five circumstantial shell nouns (environment, place, background, situation, position). The study reveals that students' exhibit varied proficiency levels in specific collocational constructions, underscoring a disparity between instructional content and real-world language usage. English instruction should thus focus on students language application in specific contexts, employ a diverse range of teaching materials, and enhance the teaching of key lexical collocations. Educators should merge classroom teaching with practical language use to more effectively guide students' language acquisition. Moreover, the significance of knowledge about collocational constructions of shell nouns in English learning cannot be overlooked. Teaching strategies should be flexibly adjusted to meet students specific needs, enhancing their deep understanding and effective application of English vocabulary collocations. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
45. TÜRKÇEDE UMUT DÜĞÜM SÖZCÜĞÜNÜN ÖRÜNTÜSEL GÖRÜNÜMLERİ: DERLEM TEMELLİ BİR ÇALIŞMA.
- Author
-
KOÇ, Yasemin, ÇALIŞKAN, Fatma, ÖZDEMİR, Damla, TOKTAY, Müjgan, and GÜNDOĞDU, Ayşe Eda
- Subjects
COMPOUND words ,COLLOCATION (Linguistics) ,THEMATIC analysis ,IDIOMS ,VOCABULARY ,PROVERBS - Abstract
Copyright of Mersin University Journal of Linguistics & Literature / Mersin Üniversitesi Dil ve Edebiyat Dergisi is the property of Mersin University and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
- Published
- 2024
46. Equilíbrio entre intenção comunicativa e adoção de estratégias de um bilingue na interpretação simultânea: um estudo de caso.
- Author
-
TAO CHEN, LILI HAN, REGO, VÂNIA, and JIAJIA SUI
- Subjects
INTENTION ,TRANSLATORS ,COMMUNICATION strategies ,CORPORA - Abstract
Copyright of Etudes Romanes de Brno is the property of Masaryk University, Faculty of Arts and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
- Published
- 2024
- Full Text
- View/download PDF
47. Estudio comparativo sobre el estilo de traducción de las analectas (lunyu): análisis basado en el corpus paralelo que compila las versiones de Pérez Arroyo y de Suárez Girard
- Author
-
Ya Zuo and Li Biwei
- Subjects
corpus ,Analectas ,estilo de traducción ,traducción chino-español ,Philology. Linguistics ,P1-1091 - Abstract
El estudio del estilo de traducción recibe cada vez más importancia en el ámbito de los Estudios de la Traducción Basados en Corpus (ETBC). Esta investigación se orienta a la comparación de las traducciones de Pérez Arroyo (1981; 1999) y de Suárez Girard (1997) de las Analectas (Lunyu) de Confucio. Para ello, se realiza un corpus paralelo chino-español que, según un plano léxico, sintáctico y discursivo, permite analizar sus respectivos perfiles estilísticos y revelar las similitudes y particularidades presentadas en dichas versiones, así como las razones subyacentes. Este corpus, que se configura con una combinación metodológica cuantitativa y cualitativa, nos permite conocer mejor las características estilísticas y los problemas que presentan estas obras. Los resultados obtenidos sirven de un recurso provechoso e ineludible para futuras traducciones del clásico confuciano.
- Published
- 2024
- Full Text
- View/download PDF
48. Catálogo de adaptaciones en el nuevo milenio: de la literatura al cine británico (2001-2020)
- Author
-
Irene Romero González
- Subjects
adaptación cinematográfica ,literatura inglesa ,análisis comparativo ,corpus ,Literature (General) ,PN1-6790 ,Fine Arts - Abstract
Análisis de la evolución del cine británico entre 2001 y 2020, destancando el papel esencial de las adaptaciones literarias además de la influencia de la televisión en la producción de estas adaptaciones y el impacto cultural de estos trabajos.
- Published
- 2024
- Full Text
- View/download PDF
49. La Campiña de Córdoba: fonología y morfología El caso de Santaella (1740-1820): entre España y América
- Author
-
Javier Puerma Bonilla
- Subjects
fonología ,morfología ,corpus ,dialectología ,Campiña de Córdoba ,variación ,Philology. Linguistics ,P1-1091 - Abstract
Este trabajo tiene como objetivo avanzar en el análisis de la fonología y de la morfología de la Campiña de Córdoba entre 1740 y 1820, a partir de un corpus epistolar de proximidad comunicativa perteneciente a la misma saga familiar en la villa de Santaella. Estudiaremos el vocalismo, el consonantismo y la forma -(s-z)g- en verbos terminados en -(e)cer y en -cir. Estableceremos un diálogo de base teórica y empírica con el objeto de determinar el alcance de los rasgos de nuestro corpus en el espacio hispanohablante de la época y, con base en lo anterior, discutir la pertinencia de algunas etiquetas de cobertura frecuentes en la descripción de dichos rasgos en la historia del español.
- Published
- 2024
- Full Text
- View/download PDF
50. Russian Feminitives on Social Media: A Corpus-Aided Approach for Their Analysis
- Author
-
Shishebarova, Yulia S., Sokolova, Lyubov A., Mamaev, Ivan D., Brilly, Mitja, Advisory Editor, Hoalst-Pullen, Nancy, Advisory Editor, Leitner, Michael, Advisory Editor, Patterson, Mark W., Advisory Editor, Veress, Márton, Advisory Editor, Bakaev, Maxim, editor, Bolgov, Radomir, editor, Chugunov, Andrei V., editor, Pereira, Roberto, editor, R, Elakkiya, editor, and Zhang, Wei, editor
- Published
- 2024
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.