Back to Search
Start Over
Development of a method for the recognition of author’s style in the Ukrainian language texts based on linguometry, stylemetry and glottochronology
- Source :
- Eastern-European Journal of Enterprise Technologies. 4:10-19
- Publication Year :
- 2017
- Publisher :
- Private Company Technology Center, 2017.
-
Abstract
- We solved the problem of development of algorithmic software for processes of content monitoring for solving the problem of recognition of the style of an author of a Ukrainian text based on Web Mining and NLP technology. Decomposition of the method for recognition of the style of an author, based of analysis of the found stop words, was carried out. Specific features of the method include adaptation of morphological and syntactic analysis of lexical units to structural peculiarities of words/ texts in Ukrainian. It is syntactic words (stop words or anchor words) that are significant for an author’s individual style, as they are not related to the theme and content of the publication. Recognition of the author's style is based on analysis of coefficients of lexical author’s language: coherence of speech, lexical diversity, syntactic complexity indices of concentration and exclusivity for the author's fragment. They are used for subsequent comparison and determining of a degree of belonging of the analyzed text to a particular author. We studied internal "dynamics" of a text of randomly selected authors through analysis of coefficients of lexical author’s language for the first k, n and m (without the title) words of the author's fragment and the analyzed one. The obtained results were compared. We obtained results of experimental testing of the proposed method for content-monitoring for determining and analysis of stop words in Ukrainian scientific texts of technical area based on Web Mining technology. It was found that for the selected experimental base that contains 100 works, the method for analysis of an article without compulsory initial information and list of references attains the best results by density criterion. It is achieved through learning of the system and by checking specified blocked words and specified thematic vocabulary. Testing of the proposed method for determining of keywords from other categories of texts – of scientific humanitarian area, belles-lettres, journalistic, etc. – requires subsequent experimental research.
- Subjects :
- Vocabulary
Computer science
media_common.quotation_subject
0211 other engineering and technologies
Energy Engineering and Power Technology
Lexical diversity
02 engineering and technology
computer.software_genre
Industrial and Manufacturing Engineering
Style (sociolinguistics)
0203 mechanical engineering
Management of Technology and Innovation
021105 building & construction
Electrical and Electronic Engineering
media_common
Parsing
Stop words
business.industry
Applied Mathematics
Mechanical Engineering
Computer Science Applications
Quantitative linguistics
020303 mechanical engineering & transports
Web mining
Control and Systems Engineering
Artificial intelligence
business
computer
Coherence (linguistics)
Natural language processing
Subjects
Details
- ISSN :
- 17294061 and 17293774
- Volume :
- 4
- Database :
- OpenAIRE
- Journal :
- Eastern-European Journal of Enterprise Technologies
- Accession number :
- edsair.doi...........ded028c162d07964548648788de49ba5
- Full Text :
- https://doi.org/10.15587/1729-4061.2017.107512