Back to Search
Start Over
Unsupervised morphological segmentation based on affixality measurements
- Source :
- Pattern Recognition Letters. 84:127-133
- Publication Year :
- 2016
- Publisher :
- Elsevier BV, 2016.
-
Abstract
- A new method for unsupervised morphological segmentation is presented.The method is based on a combination of affixality measurements.The method performed well for Spanish multi-slot morphology.In an empirical evaluation, the new method outperformed Morfessor and ParaMor.Results show that our method is competitive for Spanish morphological segmentation. In this paper, we present a method for unsupervised morphological segmentation for multi-slot morphology based on affixality measurements. These measurements quantify three linguistic characteristics of affixes: (1) they combine with many low frequency word-bases (high combinatorial capacity), (2) although they are relatively few, they help to maximize the size of a lexicon (economy principle), i.e. speakers know more words by remembering fewer morphological items, and (3) they are very frequent, so they contain less information than word-bases (entropy), i.e. borders between affixes and stems can be detected by finding entropy peaks. Several experiments combining these measurements were conducted to find the best way to apply them to data. The best strategy consists in successive segmentation when the average of the affixality measurements surpasses a threshold of 0.5. Also, we compared this strategy with some state-of-the-art methods for unsupervised morphological segmentation (Morfessor and ParaMor). Our method outperformed these methods, when tested in a hand-made corpus. Results indicate that our proposal is competitive at least for the morphological segmentation of Spanish words.
- Subjects :
- Segmentation-based object categorization
business.industry
Computer science
Scale-space segmentation
Pattern recognition
02 engineering and technology
Lexicon
03 medical and health sciences
0302 clinical medicine
Artificial Intelligence
Signal Processing
030221 ophthalmology & optometry
0202 electrical engineering, electronic engineering, information engineering
Entropy (information theory)
Unsupervised learning
020201 artificial intelligence & image processing
Segmentation
Computer Vision and Pattern Recognition
Artificial intelligence
business
Morphological segmentation
Software
Subjects
Details
- ISSN :
- 01678655
- Volume :
- 84
- Database :
- OpenAIRE
- Journal :
- Pattern Recognition Letters
- Accession number :
- edsair.doi...........32ff4746865dd3aa3a9fed40869e72d0
- Full Text :
- https://doi.org/10.1016/j.patrec.2016.09.001