Back to Search Start Over

Document Representation with Statistical Word Senses in Cross-Lingual Document Clustering.

Authors :
Tang, Guoyu
Xia, Yunqing
Cambria, Erik
Jin, Peng
Zheng, Thomas Fang
Source :
International Journal of Pattern Recognition & Artificial Intelligence. Mar2015, Vol. 29 Issue 2, p-1. 26p.
Publication Year :
2015

Abstract

Cross-lingual document clustering is the task of automatically organizing a large collection of multi-lingual documents into a few clusters, depending on their content or topic. It is well known that language barrier and translation ambiguity are two challenging issues for cross-lingual document representation. To this end, we propose to represent cross-lingual documents through statistical word senses, which are automatically discovered from a parallel corpus through a novel cross-lingual word sense induction model and a sense clustering method. In particular, the former consists in a sense-based vector space model and the latter leverages on a sense-based latent Dirichlet allocation. Evaluation on the benchmarking datasets shows that the proposed models outperform two state-of-the-art methods for cross-lingual document clustering. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
02180014
Volume :
29
Issue :
2
Database :
Academic Search Index
Journal :
International Journal of Pattern Recognition & Artificial Intelligence
Publication Type :
Academic Journal
Accession number :
108351166
Full Text :
https://doi.org/10.1142/S021800141559003X