Back to Search
Start Over
Document Representation with Statistical Word Senses in Cross-Lingual Document Clustering.
- Source :
-
International Journal of Pattern Recognition & Artificial Intelligence . Mar2015, Vol. 29 Issue 2, p-1. 26p. - Publication Year :
- 2015
-
Abstract
- Cross-lingual document clustering is the task of automatically organizing a large collection of multi-lingual documents into a few clusters, depending on their content or topic. It is well known that language barrier and translation ambiguity are two challenging issues for cross-lingual document representation. To this end, we propose to represent cross-lingual documents through statistical word senses, which are automatically discovered from a parallel corpus through a novel cross-lingual word sense induction model and a sense clustering method. In particular, the former consists in a sense-based vector space model and the latter leverages on a sense-based latent Dirichlet allocation. Evaluation on the benchmarking datasets shows that the proposed models outperform two state-of-the-art methods for cross-lingual document clustering. [ABSTRACT FROM AUTHOR]
Details
- Language :
- English
- ISSN :
- 02180014
- Volume :
- 29
- Issue :
- 2
- Database :
- Academic Search Index
- Journal :
- International Journal of Pattern Recognition & Artificial Intelligence
- Publication Type :
- Academic Journal
- Accession number :
- 108351166
- Full Text :
- https://doi.org/10.1142/S021800141559003X