Back to Search Start Over

An Intelligent CNN-VAE Text Representation Technology Based on Text Semantics for Comprehensive Big Data

Authors :
Liu, Genggeng
Guo, Canyang
Xie, Lin
Liu, Wenxi
Xiong, Naixue
Chen, Guolong
Publication Year :
2020

Abstract

In the era of big data, a large number of text data generated by the Internet has given birth to a variety of text representation methods. In natural language processing (NLP), text representation transforms text into vectors that can be processed by computer without losing the original semantic information. However, these methods are difficult to effectively extract the semantic features among words and distinguish polysemy in language. Therefore, a text feature representation model based on convolutional neural network (CNN) and variational autoencoder (VAE) is proposed to extract the text features and apply the obtained text feature representation on the text classification tasks. CNN is used to extract the features of text vector to get the semantics among words and VAE is introduced to make the text feature space more consistent with Gaussian distribution. In addition, the output of the improved word2vec model is employed as the input of the proposed model to distinguish different meanings of the same word in different contexts. The experimental results show that the proposed model outperforms in k-nearest neighbor (KNN), random forest (RF) and support vector machine (SVM) classification algorithms.

Details

Database :
arXiv
Publication Type :
Report
Accession number :
edsarx.2008.12522
Document Type :
Working Paper