1. Misogynistic attitude detection in YouTube comments and replies: A high-quality dataset and algorithmic models.
- Author
-
Singh, Aakash, Sharma, Deepawali, and Singh, Vivek Kumar
- Subjects
- *
SOCIAL media , *MACHINE learning , *DEEP learning , *MIXED languages - Abstract
• A curated high-quality dataset of 12,698 YouTube comments and replies in Hindi-English code-mixed language for misogynistic attitude detection is proposed. • The dataset provides for two tasks- first to identify optimistic, pessimistic, or neutral attitude in content and then labelling comments into the categories of suggestion, appreciation, criticism, offensive, or none. • A set of algorithmic models comprising techniques from machine learning, deep learning and transformer-based models are applied. • The mBERT model gives best performance on both subtasks, with macro average F1 scores of 0.59 and 0.52, and weighted average F1 scores of 0.66 and 0.65, respectively. • The experimental evaluation and results confirm the suitability of the dataset and the real-world applications and future extension possibilities of the work are discussed. Social media platforms are now not only a medium for expressing users views, feelings, emotions and sentiments but are also being abused by people to propagate unpleasant and hateful content. Consequently, research efforts have been made to develop techniques and models for automatically detecting and identifying hateful, abusive, vulgar, and offensive content on different platforms. Although significant progress has been made on the task, the research on design of methods to detect misogynistic attitude of people in non-English and code-mixed languages is not very well-developed. Non-availability of suitable datasets and resources is one main reason for this. Therefore, this paper attempts to bridge this research gap by presenting a high-quality curated dataset in the Hindi-English code-mixed language. The dataset includes 12,698 YouTube comments and replies, with each comment annotated under two-level categories, first as optimistic and pessimistic, and then into different types at second level based on the content. The inter-annotator agreement in the dataset is found to be 0.84 for the first subtask, and 0.79 for the second subtask, indicating the reasonably high quality of annotations. Different algorithmic models are explored for the task of automatic detection of the misogynistic attitude expressed in the comments, with the mBERT model giving best performance on both subtasks (reported macro average F1 scores of 0.59 and 0.52, and weighted average F1 scores of 0.66 and 0.65, respectively). The analysis and results suggest that the dataset can be used for further research on the topic and that the developed algorithmic models can be applied for automatic detection of misogynistic attitude in social media conversations and posts. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF