Back to Search Start Over

Development of a method for the recognition of author’s style in the Ukrainian language texts based on linguometry, stylemetry and glottochronology

Authors :
Igor Bobyk
Dmytro Uhryn
Victoria Vysotska
Petro Pukach
Vasyl Lytvyn
Source :
Eastern-European Journal of Enterprise Technologies. 4:10-19
Publication Year :
2017
Publisher :
Private Company Technology Center, 2017.

Abstract

We solved the problem of development of algorithmic software for processes of content monitoring for solving the problem of recognition of the style of an author of a Ukrainian text based on Web Mining and NLP technology. Decomposition of the method for recognition of the style of an author, based of analysis of the found stop words, was carried out. Specific features of the method include adaptation of morphological and syntactic analysis of lexical units to structural peculiarities of words/ texts in Ukrainian. It is syntactic words (stop words or anchor words) that are significant for an author’s individual style, as they are not related to the theme and content of the publication. Recognition of the author's style is based on analysis of coefficients of lexical author’s language: coherence of speech, lexical diversity, syntactic complexity indices of concentration and exclusivity for the author's fragment. They are used for subsequent comparison and determining of a degree of belonging of the analyzed text to a particular author. We studied internal "dynamics" of a text of randomly selected authors through analysis of coefficients of lexical author’s language for the first k, n and m (without the title) words of the author's fragment and the analyzed one. The obtained results were compared. We obtained results of experimental testing of the proposed method for content-monitoring for determining and analysis of stop words in Ukrainian scientific texts of technical area based on Web Mining technology. It was found that for the selected experimental base that contains 100 works, the method for analysis of an article without compulsory initial information and list of references attains the best results by density criterion. It is achieved through learning of the system and by checking specified blocked words and specified thematic vocabulary. Testing of the proposed method for determining of keywords from other categories of texts – of scientific humanitarian area, belles-lettres, journalistic, etc. – requires subsequent experimental research.

Details

ISSN :
17294061 and 17293774
Volume :
4
Database :
OpenAIRE
Journal :
Eastern-European Journal of Enterprise Technologies
Accession number :
edsair.doi...........ded028c162d07964548648788de49ba5
Full Text :
https://doi.org/10.15587/1729-4061.2017.107512