Back to Search Start Over

Représentation à base de connaissance pour une méthode de classification transductive de document multilangue

Authors :
Romeo, S.
Ienco, D.
Tagarelli, A.
Universita Mediterranea of Reggio Calabria [Reggio Calabria]
Territoires, Environnement, Télédétection et Information Spatiale (UMR TETIS)
Centre de Coopération Internationale en Recherche Agronomique pour le Développement (Cirad)-AgroParisTech-Institut national de recherche en sciences et technologies pour l'environnement et l'agriculture (IRSTEA)
Dipartimento di Ingegneria Informatica, Modellistica, Elettronica e Sistemistica [Calabria] (DIMES)
Università della Calabria [Arcavacata di Rende] (Unical)
ADVanced Analytics for data SciencE (ADVANSE)
Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier (LIRMM)
Centre National de la Recherche Scientifique (CNRS)-Université de Montpellier (UM)-Centre National de la Recherche Scientifique (CNRS)-Université de Montpellier (UM)
Source :
Lecture notes in computer science: Advances in Information Retrieval-37th European Conference on IR Research ECIR 2015 Proceedings, 37th European Conference on Information Retrieval ECIR 2015, 37th European Conference on Information Retrieval ECIR 2015, Mar 2015, Vienne, Austria. pp.92-103, 37th European Conference on Information Retrieval (ECIR), 37th European Conference on Information Retrieval (ECIR), Mar 2015, Vienna, Austria. pp.92-103, ⟨10.1007/978-3-319-16354-3_11⟩
Publication Year :
2015
Publisher :
HAL CCSD, 2015.

Abstract

International audience; Multilingual document classification is often addressed by approaches that rely on language-specific resources (e.g., bilingual dictionaries and machine translation tools) to evaluate cross-lingual document similarities. However, the required transformations may alter the original document semantics, raising additional issues to the known difficulty of obtaining high-quality labeled datasets. To overcome such issues we propose a new framework for multilingual document classification under a transductive learning setting. We exploit a large-scale multilingual knowledge base, BabelNet, to support the modeling of different language-written documents into a common conceptual space, without requiring any language translation process. We resort to a state-of-the-art transductive learner to produce the document classification. Results on two real-world multilingual corpora have highlighted the effectiveness of the proposed document model w.r.t. document representations usually involved in multilingual and cross-lingual analysis, and the robustness of the transductive setting for multilingual document classification.

Details

Language :
English
Database :
OpenAIRE
Journal :
Lecture notes in computer science: Advances in Information Retrieval-37th European Conference on IR Research ECIR 2015 Proceedings, 37th European Conference on Information Retrieval ECIR 2015, 37th European Conference on Information Retrieval ECIR 2015, Mar 2015, Vienne, Austria. pp.92-103, 37th European Conference on Information Retrieval (ECIR), 37th European Conference on Information Retrieval (ECIR), Mar 2015, Vienna, Austria. pp.92-103, ⟨10.1007/978-3-319-16354-3_11⟩
Accession number :
edsair.dedup.wf.001..13e9de7af758381da89fc6753ce83071
Full Text :
https://doi.org/10.1007/978-3-319-16354-3_11⟩