Back to Search
Start Over
Semantic-based multilingual document clustering via tensor modeling
- Source :
- Scopus-Elsevier, Conference on Empirical Methods in Natural Language Processing, EMNLP: Empirical Methods in Natural Language Processing, EMNLP: Empirical Methods in Natural Language Processing, Oct 2014, Doha, Qatar. pp.600-609, ⟨10.3115/v1/D14-1065⟩, EMNLP, Conference on Empirical Methods in Natural Language Processing, EMNLP, Conference on Empirical Methods in Natural Language Processing, Oct 2014, Doha, France. 10 p, EMNLP
-
Abstract
- EMNLP, Conference on Empirical Methods in Natural Language Processing , Doha, QAT, 25-/10/2014 - 29/10/2014; International audience; A major challenge in document clustering research arises from the growing amount of text data written in different languages. Previous approaches depend on language-specific solutions (e.g., bilingual dictionaries, sequential machine translation) to evaluate document similarities, and the required transformations may alter the original document semantics. To cope with this issue we propose a new document clustering approach for multilingual corpora that (i) exploits a large-scale multilingual knowledge base, (ii) takes advantage of the multi-topic nature of the text documents, and (iii) employs a tensor-based model to deal with high dimensionality and sparseness. Results have shown the significance of our approach and its better performance w.r.t. classic document clustering approaches, in both a balanced and an unbalanced corpus evaluation.
- Subjects :
- [INFO.INFO-DB]Computer Science [cs]/Databases [cs.DB]
Computer science
business.industry
ANALYSE INFORMATIQUE
MODELLING
CLUSTERING
Document clustering
Translation (geometry)
computer.software_genre
Semantics
MODELISATION
Knowledge base
[INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG]
ALGORITHME
Tensor (intrinsic definition)
[INFO.INFO-IR]Computer Science [cs]/Information Retrieval [cs.IR]
[SDE]Environmental Sciences
ComputingMethodologies_DOCUMENTANDTEXTPROCESSING
Artificial intelligence
Cluster analysis
business
computer
Natural language processing
Subjects
Details
- Database :
- OpenAIRE
- Journal :
- Scopus-Elsevier, Conference on Empirical Methods in Natural Language Processing, EMNLP: Empirical Methods in Natural Language Processing, EMNLP: Empirical Methods in Natural Language Processing, Oct 2014, Doha, Qatar. pp.600-609, ⟨10.3115/v1/D14-1065⟩, EMNLP, Conference on Empirical Methods in Natural Language Processing, EMNLP, Conference on Empirical Methods in Natural Language Processing, Oct 2014, Doha, France. 10 p, EMNLP
- Accession number :
- edsair.doi.dedup.....8c449c8345580a4f03cbe2df8549843d
- Full Text :
- https://doi.org/10.3115/v1/D14-1065⟩