Back to Search Start Over

A comparative study of syllables and character level N-grams for Dravidian multi-script and code-mixed offensive language identification.

Authors :
Balouchzahi, Fazlourrahman
Shashirekha, Hosahalli Lakshmaiah
Sidorov, Grigori
Gelbukh, Alexander
Source :
Journal of Intelligent & Fuzzy Systems; 2022, Vol. 43 Issue 6, p6995-7005, 11p
Publication Year :
2022

Abstract

Curfews and lockdowns around the world in the Covid-19 era have increased the usage of the internet drastically and accordingly the amount of data shared on social media. In addition to using social media for sharing useful information, some miscreants are using the power of social media to spread hate speech and offensive content. Filtering the offensive language content manually is a laborious task due to the huge volume of data. Further, rapid developments in hardware and software technology have provided opportunities for users to post their comments not only in English but also in their native language scripts. However, based on the ease of Roman script usage, social media users specifically in multilingual countries like India, prefer to comment in code-mixed and multi-script texts. The typical systems that are employed to process and analyze monolingual texts are usually not appropriate for these kinds of texts. Further, as these texts do not adhere to the rules and regulations of any language to frame the words and sentences, the complexity of analyzing such texts increases. The novelty of the present study is to address the Offensive Language Identification (OLI) task in code-mixed and multi-script texts, this paper proposes to use relevant syllable and character n-grams features to train Machine Learning (ML) classifiers. The performance of the proposed models is evaluated on three Dravidian language pairs, namely: Malayalam-English, Tamil- English, and Kannada-English. The performances of ML classifiers prove the effectiveness of syllable and character n-grams features for code-mixed and multi-script texts analysis. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
10641246
Volume :
43
Issue :
6
Database :
Complementary Index
Journal :
Journal of Intelligent & Fuzzy Systems
Publication Type :
Academic Journal
Accession number :
160553576
Full Text :
https://doi.org/10.3233/JIFS-212872