A Supervised Learning Approach for Authorship Attribution of Bengali Literary Texts

Authors :: Phani, Shanta
Lahiri, Shibamouli
Biswas, Arindam
Source :: ACM Transactions on Asian and Low-Resource Language Information Processing; August 2017, Vol. 16 Issue: 4 p1-15, 15p
Publication Year :: 2017
Abstract: Authorship Attribution is a long-standing problem in Natural Language Processing. Several statistical and computational methods have been used to find a solution to this problem. In this article, we have proposed methods to deal with the authorship attribution problem in Bengali. More specifically, we proposed a supervised framework consisting of lexical and shallow features and investigated the possibility of using topic-modeling-inspired features, to classify documents according to their authors. We have created a corpus from nearly all the literary works of three eminent Bengali authors, consisting of 3,000 disjoint samples. Our models showed better performance than the state-of-the-art, with more than 98% test accuracy for the shallow features and 100% test accuracy for the topic-based features. Further experiments with GloVe vectors [Pennington et al. 2014] showed comparable results, but flexible patterns based on content words and high-frequency words [Schwartz et al. 2013] failed to perform as well as expected.

Language :: English
ISSN :: 23754699 and 23754702
Volume :: 16
Issue :: 4
Database :: Supplemental Index
Journal :: ACM Transactions on Asian and Low-Resource Language Information Processing
Publication Type :: Periodical
Accession number :: ejs43025252
Full Text :: https://doi.org/10.1145/3099473

Full Text Access

Tools