Back to Search Start Over

Extracting Chinese multi-word terms from small corpus

Authors :
Huang Heyan
Zhou Lang
Zhang Liang
Feng Chong
Source :
ISKE
Publication Year :
2008
Publisher :
IEEE, 2008.

Abstract

In this paper, we present an automatic terminology extraction approach for Chinese multi-word terms. In this term extraction system, besides five linguistic rules acquired from an available term list by some machine learning methods, two statistical strategies are involved: a termhood measure based on the term distribution variation, and a unithood measure adopting the left and right entropy method to estimate the collocation variation degree. The candidates are ranked according to the values of the former. The latter is used to filter the preposition phrases and some verb-object phrases that rarely appear as terms. By validating on a small scale corpus in the computer domain, the precision reaches 91.5% of the top 2000 outputs.

Details

Database :
OpenAIRE
Journal :
2008 3rd International Conference on Intelligent System and Knowledge Engineering
Accession number :
edsair.doi...........2bf79b7757a3d8d004e930eb99105bdb