Back to Search Start Over

Text Segmentation Via Processes that Count the Number of Different Words Forward and Backward.

Authors :
Abebe, Berhane
Chebunin, Mikhail
Kovalevskii, Artyom
Source :
Journal of Quantitative Linguistics. Feb2024, Vol. 31 Issue 1, p1-18. 18p.
Publication Year :
2024

Abstract

The paper is developing a new statistical approach to automatic partitioning of texts into parts belonging to different authors. It is based on the analysis of processes that counts the number of different words forward and backward. The theoretical study of the processes is based on the assumptions of an elementary probability model with a change point. We prove consistence of our statistical estimate of the point of concatenation in the case when the concatenated texts have different Zipf exponents. This method is being tested on the Brown corpus and also on newspaper texts in different languages. Testing shows a good estimate of the concatenation point. This method can be used in parallel with other text segmentation methods. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
09296174
Volume :
31
Issue :
1
Database :
Academic Search Index
Journal :
Journal of Quantitative Linguistics
Publication Type :
Academic Journal
Accession number :
176656881
Full Text :
https://doi.org/10.1080/09296174.2023.2275342