Back to Search Start Over

Short Text Feature Selection for Micro-Blog Mining

Authors :
Zitao Liu
Shuran Wang
Wenchao Yu
Wei Chen
Fengyi Wu
Source :
2010 International Conference on Computational Intelligence and Software Engineering.
Publication Year :
2010
Publisher :
IEEE, 2010.

Abstract

Feather selection is a process that extracts a number of feature subsets which are the most representative of the original meaning from original feature set. It greatly reduces the text processing time and increases the accuracy because of removing some data outliers. With the rapid development of Web 2.0 and the further evolution of the Internet, short text like micro-blog plays an important role in people's daily life. However, existing feature selection methods cannot effectively extract these short text features, and greatly reduce the classification and clustering performance of short text. In this regard, we propose a novel feature selection method based on part-of-speech and HowNet. According to the composition of the text property, we choose the words with larger amount of information by different part-of-speech, and then expand the semantic features of these words based on HowNet, in this way the short text has more useful features. We use test data set collected from sina micro-blog and adopt the micro average and macro average of F1-Measure to evaluate the effects of short text classification. The results show that the short text feature selected by our method has a good amount of information, as well as good classification results.

Details

Database :
OpenAIRE
Journal :
2010 International Conference on Computational Intelligence and Software Engineering
Accession number :
edsair.doi...........88d3eae37776883d29020267c9fbd2bb