Back to Search
Start Over
Research of Automatic Topic Detection Based on Incremental Clustering
- Source :
- Journal of Software. 23:1578-1587
- Publication Year :
- 2012
- Publisher :
- China Science Publishing & Media Ltd., 2012.
-
Abstract
- With the exponential growth of information on the Internet, it has become increasingly difficult to find and organize relevant material. Topic detection and tracking (TDT) is a research area addressing this problem. As one of the basic tasks of TDT, topic detection is the problem of grouping all stories, based on the topics they discuss. This paper proposes a new topic detection method (TPIC) based on an incremental clustering algorithm. The proposed topic detection strives to achieve a high accuracy and the capability of estimating the true number of topics in the document corpus. Term reweighing algorithm is used to accurately and efficiently cluster the given document corpus, and a self-refinement process of discriminative feature identification is proposed to improve the performance of clustering. Furthermore, topics' "aging" nature is used to precluster stories, and Bayesian information criterion (BIC) is used to estimate the true number of topics. Experimental results on linguistic data consortium (LDC) datasets TDT-4 show that the proposed model can improve both efficiency and accuracy
- Subjects :
- Computer science
business.industry
Process (engineering)
Document clustering
computer.software_genre
Machine learning
Identification (information)
Discriminative model
Bayesian information criterion
Feature (computer vision)
Data mining
Artificial intelligence
business
Cluster analysis
computer
Software
Subjects
Details
- ISSN :
- 10009825
- Volume :
- 23
- Database :
- OpenAIRE
- Journal :
- Journal of Software
- Accession number :
- edsair.doi...........761e262f084fecab5bdb99d5c16037ab