Back to Search Start Over

Zero-shot hashtag segmentation for multilingual sentiment analysis

Authors :
Rodrigues, Ruan Chaves
Inuzuka, Marcelo Akira
Gomes, Juliana Resplande Sant'Anna
Rocha, Acquila Santos
Calixto, Iacer
Nascimento, Hugo Alexandre Dantas do
Publication Year :
2021

Abstract

Hashtag segmentation, also known as hashtag decomposition, is a common step in preprocessing pipelines for social media datasets. It usually precedes tasks such as sentiment analysis and hate speech detection. For sentiment analysis in medium to low-resourced languages, previous research has demonstrated that a multilingual approach that resorts to machine translation can be competitive or superior to previous approaches to the task. We develop a zero-shot hashtag segmentation framework and demonstrate how it can be used to improve the accuracy of multilingual sentiment analysis pipelines. Our zero-shot framework establishes a new state-of-the-art for hashtag segmentation datasets, surpassing even previous approaches that relied on feature engineering and language models trained on in-domain data.<br />Comment: 12 pages, 5 figures, 5 tables

Details

Database :
arXiv
Publication Type :
Report
Accession number :
edsarx.2112.03213
Document Type :
Working Paper