Back to Search
Start Over
Classifying news articles in multiple languages: leveraging context aware models.
- Source :
- Procedia Computer Science; 2022, Vol. 205, p97-106, 10p
- Publication Year :
- 2022
-
Abstract
- Despite the recent advances in text classification and the performance improvement yielded by Transformers models, the absence or inaccessibility of an adequate dataset to train a text classifier motivates the choice for alternative routes. In this study, the need to detect specific topics in the news and to discard irrelevant content encouraged the development of an article tagging pipeline which assesses the similarity between a user-defined dictionary of topic-specific keywords and news article keywords. The innovation of the paper stands in the exploitation of two BERT-based algorithms to retrieve article keywords and to embed them, which previous studies have shown to outperform state of the art solutions for keywords extraction and semantic textual similarity. In a nutshell, the pipeline computes the semantic similarity between the sentence embeddings generated from topic-specific keywords and those produced from news article keywords extracted with the KeyBERT algorithm, finally classifying each article according to a previously defined topic. The results are supported by sound coherence and diversity metrics computed, by aggregating each article by their first tag, which attests to the semantic validity of the pipeline outputs. [ABSTRACT FROM AUTHOR]
- Subjects :
- ROUTE choice
SEMANTIC computing
TECHNOLOGICAL innovations
Subjects
Details
- Language :
- English
- ISSN :
- 18770509
- Volume :
- 205
- Database :
- Supplemental Index
- Journal :
- Procedia Computer Science
- Publication Type :
- Academic Journal
- Accession number :
- 159269080
- Full Text :
- https://doi.org/10.1016/j.procs.2022.09.011