Back to Search
Start Over
A Supervised Learning Approach for Authorship Attribution of Bengali Literary Texts
- Source :
- ACM Transactions on Asian and Low-Resource Language Information Processing; August 2017, Vol. 16 Issue: 4 p1-15, 15p
- Publication Year :
- 2017
-
Abstract
- Authorship Attribution is a long-standing problem in Natural Language Processing. Several statistical and computational methods have been used to find a solution to this problem. In this article, we have proposed methods to deal with the authorship attribution problem in Bengali. More specifically, we proposed a supervised framework consisting of lexical and shallow features and investigated the possibility of using topic-modeling-inspired features, to classify documents according to their authors. We have created a corpus from nearly all the literary works of three eminent Bengali authors, consisting of 3,000 disjoint samples. Our models showed better performance than the state-of-the-art, with more than 98% test accuracy for the shallow features and 100% test accuracy for the topic-based features. Further experiments with GloVe vectors [Pennington et al. 2014] showed comparable results, but flexible patterns based on content words and high-frequency words [Schwartz et al. 2013] failed to perform as well as expected.
Details
- Language :
- English
- ISSN :
- 23754699 and 23754702
- Volume :
- 16
- Issue :
- 4
- Database :
- Supplemental Index
- Journal :
- ACM Transactions on Asian and Low-Resource Language Information Processing
- Publication Type :
- Periodical
- Accession number :
- ejs43025252
- Full Text :
- https://doi.org/10.1145/3099473