1. Methods of corpus analysis of the frequency and dynamics of language units in 2024
- Author
-
Ekaterina Borzenko
- Subjects
diachrony ,microdiachrony ,nanodiachrony ,linguistic corpus ,the russian national corpus ,gicr ,“natural corpus” ,search methods ,frequency in language ,language statistics. ,диахрония ,микродиахрония ,нанодиахрония ,лингвистический корпус ,нкря ,гикря ,«естественный корпус» ,методы поиска ,частотность в языке ,языковая статистика. ,Philology. Linguistics ,P1-1091 ,Literature (General) ,PN1-6790 - Abstract
The article “Methods of corpus analysis of the frequency and dynamics of language units in 2024” presents possible methods and sources of searching, frequency analysis and diachrony (microdiachrony, nanodiachrony) of Russian language units using not only well-known linguistic corpora such as RNC and GICR, but also “natural corpora” of Yandex (including Yandex. News) and Google, as well as other search sources, such as LiveJournal, VKontakte and Telegram. The author indicates the possibility of using the official English-language corpora in the study of loanwords-neologisms in the modern Russian language. There is a discussion of the pros and cons of each service with special attention to the search tools that allow a linguist to specify the data. In the article there also are considered search services in the databases of digitized books: in Google Books (including the Google Books Ngram Viewer) and the electronic catalog of the Russian State Library. There is a presentation of the services (Yandex. Wordstat and Google Trends) that show user interest in a language unit – a word or phrase. All or almost all of the services, the possibility of using which is analyzed in the article, have already been used in one way or another in linguistic analysis in scientific papers. However, it should be considered that, firstly, it is often necessary to use several sources at once for an adequate analysis, and, secondly, the choice of priority sources depends on the type of the information a linguist is looking for. Conclusions are drawn about the possible order of use of the services discussed for different categories of words (long-known; words of the Runet era; very new and rare words; words for which it is necessary to find out the frequency by half-year or by month) and the need to be able to use each of these tools is justified.
- Published
- 2024
- Full Text
- View/download PDF