Back to Search
Start Over
A clustering technique for news articles using WordNet
- Source :
- Knowledge-Based Systems. 36:115-128
- Publication Year :
- 2012
- Publisher :
- Elsevier BV, 2012.
-
Abstract
- The Web is overcrowded with news articles, an overwhelming information source both with its amount and diversity. Document clustering is a powerful technique that has been widely used for organizing data into smaller and manageable information kernels. Several approaches have been proposed which, however, suffer from problems like synonymy, ambiguity and lack of a descriptive content marking of the generated clusters. In this work, we are investigating the application of a great spectrum of clustering algorithms, as well as similarity measures, to news articles that originate from the Web. Also, we are proposing the enhancement of standard k-means algorithm using the external knowledge from WordNet hypernyms in a twofold manner: enriching the ''bag of words'' used prior to the clustering process and assisting the label generation procedure following it. Furthermore, we are examining the effect that text preprocessing has on clustering. Operating on a corpus of news articles derived from major news portals, our comparison of the existing clustering methodologies revealed that k-means, gives better aggregate results when it comes to efficiency. This is amplified when the algorithm is accompanied with preliminary steps for data cleaning and normalizing, despite its simple nature. Moreover, the proposed WordNet-enabled W-k means clustering algorithm significantly improves standard k-means generating also useful and high quality cluster tags by using the presented cluster labeling process.
- Subjects :
- DBSCAN
Clustering high-dimensional data
Information Systems and Management
Fuzzy clustering
Computer science
Correlation clustering
Conceptual clustering
computer.software_genre
Management Information Systems
Biclustering
Artificial Intelligence
CURE data clustering algorithm
Consensus clustering
Cluster analysis
Information retrieval
Brown clustering
k-means clustering
Document clustering
ComputingMethodologies_PATTERNRECOGNITION
Data stream clustering
Bag-of-words model
Cluster labeling
Canopy clustering algorithm
FLAME clustering
Data mining
computer
Software
Subjects
Details
- ISSN :
- 09507051
- Volume :
- 36
- Database :
- OpenAIRE
- Journal :
- Knowledge-Based Systems
- Accession number :
- edsair.doi...........7f81544281efc6a5333e216fdc38e49c
- Full Text :
- https://doi.org/10.1016/j.knosys.2012.06.015