Back to Search Start Over

Topic detection with recursive consensus clustering and semantic enrichment

Authors :
Vincenzo De Leo
Michelangelo Puliga
Marco Bardazzi
Filippo Capriotti
Andrea Filetti
Alessandro Chessa
Source :
Humanities & Social Sciences Communications, Vol 10, Iss 1, Pp 1-10 (2023)
Publication Year :
2023
Publisher :
Springer Nature, 2023.

Abstract

Abstract Extracting meaningful information from short texts like tweets has proved to be a challenging task. Literature on topic detection focuses mostly on methods that try to guess the plausible words that describe topics whose number has been decided in advance. Topics change according to the initial setup of the algorithms and show a consistent instability with words moving from one topic to another one. In this paper we propose an iterative procedure for topic detection that searches for the most stable solutions in terms of words describing a topic. We use an iterative procedure based on clustering on the consensus matrix, and traditional topic detection, to find both a stable set of words and an optimal number of topics. We observe however that in several cases the procedure does not converge to a unique value but oscillates. We further enhance the methodology using semantic enrichment via Word Embedding with the aim of reducing noise and improving topic separation. We foresee the application of this set of techniques in an automatic topic discovery in noisy channels such as Twitter or social media.

Details

Language :
English
ISSN :
26629992
Volume :
10
Issue :
1
Database :
Directory of Open Access Journals
Journal :
Humanities & Social Sciences Communications
Publication Type :
Academic Journal
Accession number :
edsdoj.59aabb5d471b4c978da980b7da4317c4
Document Type :
article
Full Text :
https://doi.org/10.1057/s41599-023-01711-0