Back to Search Start Over

Research of Automatic Topic Detection Based on Incremental Clustering

Authors :
Chao Wenhan
Zhang Xiaoming
Li Zhoujun
Source :
Journal of Software. 23:1578-1587
Publication Year :
2012
Publisher :
China Science Publishing & Media Ltd., 2012.

Abstract

With the exponential growth of information on the Internet, it has become increasingly difficult to find and organize relevant material. Topic detection and tracking (TDT) is a research area addressing this problem. As one of the basic tasks of TDT, topic detection is the problem of grouping all stories, based on the topics they discuss. This paper proposes a new topic detection method (TPIC) based on an incremental clustering algorithm. The proposed topic detection strives to achieve a high accuracy and the capability of estimating the true number of topics in the document corpus. Term reweighing algorithm is used to accurately and efficiently cluster the given document corpus, and a self-refinement process of discriminative feature identification is proposed to improve the performance of clustering. Furthermore, topics' "aging" nature is used to precluster stories, and Bayesian information criterion (BIC) is used to estimate the true number of topics. Experimental results on linguistic data consortium (LDC) datasets TDT-4 show that the proposed model can improve both efficiency and accuracy

Details

ISSN :
10009825
Volume :
23
Database :
OpenAIRE
Journal :
Journal of Software
Accession number :
edsair.doi...........761e262f084fecab5bdb99d5c16037ab