1. Morfeus+: Word parsing in Basque beyond morphological segmentation
- Author
-
Koldo Gojenola, Xabier Artola, Itziar Aduriz, Zuhaitz Beloki, Jose Maria Arriola, and Nerea Ezeiza
- Subjects
Agglutinative language ,Linguistics and Language ,Parsing ,Grammar ,business.industry ,Computer science ,media_common.quotation_subject ,02 engineering and technology ,computer.software_genre ,Language and Linguistics ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,Word structure ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,computer ,Morphological segmentation ,Word (computer architecture) ,Natural language processing ,media_common - Abstract
This work describes the formalization of a word structure grammar that represents the complex morphological and morphosyntactic information embedded within the word forms of an agglutinative language (Basque), giving a comprehensive linguistic description of the main morphological phenomena, such as affixation, derivation, and composition, and also taking into account the modeling of both standard and non-standard words. We have identified the relevant issues to be addressed in the representation of such a grammar.We also present the development of Morfeus+, a tool for the analysis of unrestricted texts, testing its applicability and showing that its coverage is wide and robust, allowing the efficient processing of big volumes of data.This paper describes a mature system that has required several person/years and that tries to integrate a rigorous linguistic specification together with more practical implementation matters, such as the appropriate treatment of unknown words in unrestricted texts.
- Published
- 2020
- Full Text
- View/download PDF