Back to Search Start Over

Semantic expansion using word embedding clustering and convolutional neural network for improving short text classification.

Authors :
Wang, Peng
Xu, Bo
Xu, Jiaming
Tian, Guanhua
Liu, Cheng-Lin
Hao, Hongwei
Source :
Neurocomputing. Jan2016 Part B, Vol. 174, p806-814. 9p.
Publication Year :
2016

Abstract

Text classification can help users to effectively handle and exploit useful information hidden in large-scale documents. However, the sparsity of data and the semantic sensitivity to context often hinder the classification performance of short texts. In order to overcome the weakness, we propose a unified framework to expand short texts based on word embedding clustering and convolutional neural network (CNN). Empirically, the semantically related words are usually close to each other in embedding spaces. Thus, we first discover semantic cliques via fast clustering. Then, by using additive composition over word embeddings from context with variable window width, the representations of multi-scale semantic units 1 1 Semantic units are defined as n -grams which have dominant meaning of text. With n varying, multi-scale contextual information can be exploited. in short texts are computed. In embedding spaces, the restricted nearest word embeddings (NWEs) 2 2 In order to prevent outliers, a Euclidean distance threshold is preset between semantic cliques and semantic units, which is used as restricted condition. of the semantic units are chosen to constitute expanded matrices, where the semantic cliques are used as supervision information. Finally, for a short text, the projected matrix 3 3 The projected matrix is obtained by table looking up, which encodes Unigram level features. and expanded matrices are combined and fed into CNN in parallel. Experimental results on two open benchmarks validate the effectiveness of the proposed method. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
09252312
Volume :
174
Database :
Academic Search Index
Journal :
Neurocomputing
Publication Type :
Academic Journal
Accession number :
111320677
Full Text :
https://doi.org/10.1016/j.neucom.2015.09.096