Back to Search Start Over

A Supervised Learning Approach for Authorship Attribution of Bengali Literary Texts

Authors :
Phani, Shanta
Lahiri, Shibamouli
Biswas, Arindam
Source :
ACM Transactions on Asian and Low-Resource Language Information Processing; August 2017, Vol. 16 Issue: 4 p1-15, 15p
Publication Year :
2017

Abstract

Authorship Attribution is a long-standing problem in Natural Language Processing. Several statistical and computational methods have been used to find a solution to this problem. In this article, we have proposed methods to deal with the authorship attribution problem in Bengali. More specifically, we proposed a supervised framework consisting of lexical and shallow features and investigated the possibility of using topic-modeling-inspired features, to classify documents according to their authors. We have created a corpus from nearly all the literary works of three eminent Bengali authors, consisting of 3,000 disjoint samples. Our models showed better performance than the state-of-the-art, with more than 98% test accuracy for the shallow features and 100% test accuracy for the topic-based features. Further experiments with GloVe vectors [Pennington et al. 2014] showed comparable results, but flexible patterns based on content words and high-frequency words [Schwartz et al. 2013] failed to perform as well as expected.

Details

Language :
English
ISSN :
23754699 and 23754702
Volume :
16
Issue :
4
Database :
Supplemental Index
Journal :
ACM Transactions on Asian and Low-Resource Language Information Processing
Publication Type :
Periodical
Accession number :
ejs43025252
Full Text :
https://doi.org/10.1145/3099473