Back to Search
Start Over
Zero-shot hashtag segmentation for multilingual sentiment analysis
- Publication Year :
- 2021
-
Abstract
- Hashtag segmentation, also known as hashtag decomposition, is a common step in preprocessing pipelines for social media datasets. It usually precedes tasks such as sentiment analysis and hate speech detection. For sentiment analysis in medium to low-resourced languages, previous research has demonstrated that a multilingual approach that resorts to machine translation can be competitive or superior to previous approaches to the task. We develop a zero-shot hashtag segmentation framework and demonstrate how it can be used to improve the accuracy of multilingual sentiment analysis pipelines. Our zero-shot framework establishes a new state-of-the-art for hashtag segmentation datasets, surpassing even previous approaches that relied on feature engineering and language models trained on in-domain data.<br />Comment: 12 pages, 5 figures, 5 tables
- Subjects :
- Computer Science - Computation and Language
I.2.7
Subjects
Details
- Database :
- arXiv
- Publication Type :
- Report
- Accession number :
- edsarx.2112.03213
- Document Type :
- Working Paper