Back to Search Start Over

Tackling cyber-aggression: Identification and fine-grained categorization of aggressive texts on social media using weighted ensemble of transformers.

Authors :
Sharif, Omar
Hoque, Mohammed Moshiul
Source :
Neurocomputing. Jun2022, Vol. 490, p462-481. 20p.
Publication Year :
2022

Abstract

The pervasiveness of aggressive content in social media has become a serious concern for government organizations and tech companies because of its pernicious societal effects. In recent years, social media has been repeatedly used as a tool to incite communal aggression, spread distorted propaganda, damage social harmony and demean the identity of individuals or a community in the public spaces. Therefore, restraining the proliferation of aggressive content and detecting them has become an urgent duty. Studies of the identification of aggressive content have mostly been done for English and other high-resource languages. Automatic systems developed for those languages can not accurately identify detrimental contents written in regional languages like Bengali. To compensate this insufficiency, this work presents a novel Bengali aggressive text dataset (called 'BAD') with two-level annotation. In level-A, 14158 texts are labeled as either aggressive or non-aggressive. While in level-B, 6807 aggressive texts are categorized into religious, political, verbal and gendered aggression classes each having 2217, 2085, 2043 and 462 texts respectively. This paper proposes a weighted ensemble technique including m-BERT, distil-BERT, Bangla-BERT and XLM-R as the base classifiers to identify and classify the aggressive texts in Bengali. The proposed model can readdress the softmax probabilities of the participating classifiers depending on their primary outcomes. This weighting technique has enabled the model to outdo the simple average ensemble and all other machine learning (ML), deep learning (DL) baselines. It has acquired the highest weighted f 1 -score of 93.43% in the identification task and 93.11% in the categorization task. Dataset developed as the part of this work is available at https://github.com/BAD-Bangla-Aggressive-Text-Dataset [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
09252312
Volume :
490
Database :
Academic Search Index
Journal :
Neurocomputing
Publication Type :
Academic Journal
Accession number :
156362348
Full Text :
https://doi.org/10.1016/j.neucom.2021.12.022