Back to Search Start Over

Unsupervised morphological segmentation based on affixality measurements

Authors :
Gerardo Sierra
Carlos-Francisco Mndez-Cruz
Alfonso Medina-Urrea
Source :
Pattern Recognition Letters. 84:127-133
Publication Year :
2016
Publisher :
Elsevier BV, 2016.

Abstract

A new method for unsupervised morphological segmentation is presented.The method is based on a combination of affixality measurements.The method performed well for Spanish multi-slot morphology.In an empirical evaluation, the new method outperformed Morfessor and ParaMor.Results show that our method is competitive for Spanish morphological segmentation. In this paper, we present a method for unsupervised morphological segmentation for multi-slot morphology based on affixality measurements. These measurements quantify three linguistic characteristics of affixes: (1) they combine with many low frequency word-bases (high combinatorial capacity), (2) although they are relatively few, they help to maximize the size of a lexicon (economy principle), i.e. speakers know more words by remembering fewer morphological items, and (3) they are very frequent, so they contain less information than word-bases (entropy), i.e. borders between affixes and stems can be detected by finding entropy peaks. Several experiments combining these measurements were conducted to find the best way to apply them to data. The best strategy consists in successive segmentation when the average of the affixality measurements surpasses a threshold of 0.5. Also, we compared this strategy with some state-of-the-art methods for unsupervised morphological segmentation (Morfessor and ParaMor). Our method outperformed these methods, when tested in a hand-made corpus. Results indicate that our proposal is competitive at least for the morphological segmentation of Spanish words.

Details

ISSN :
01678655
Volume :
84
Database :
OpenAIRE
Journal :
Pattern Recognition Letters
Accession number :
edsair.doi...........32ff4746865dd3aa3a9fed40869e72d0
Full Text :
https://doi.org/10.1016/j.patrec.2016.09.001