Back to Search Start Over

Cross-Language Text Classification.

Authors :
Olsson, J. Scott
Oard, Douglas W.
Hajič, Jan
Source :
SIGIR Forum; 2005 Proceedings, p645-646, 2p, 1 Chart
Publication Year :
2005

Abstract

This article presents a study which utilized English training data to classify Czech documents in cross-language text classification. The dataset is a collection of manually transcribed, spontaneous, conversational speech in English and Czech. Indexing proceeds on the English documents by first checking if the term is already present in the probabilistic dictionary. If it is, the term's frequency is incremented. Precision was calculated over the five and ten highest ranked thesaurus labels as well as the five highest concept labels alone.

Details

Language :
English
ISSN :
01635840
Database :
Complementary Index
Journal :
SIGIR Forum
Publication Type :
Periodical
Accession number :
19054849