Back to Search
Start Over
Moroccan Arabic vocabulary generation using a rule-based approach
- Source :
- Journal of King Saud University - Computer and Information Sciences. 34:8538-8548
- Publication Year :
- 2022
- Publisher :
- Elsevier BV, 2022.
-
Abstract
- NLP resources play a crucial role in the building of many NLP applications. The importance of these resources depends not only on their size and coverage but also on the richness and the precision of the annotated information they provide. In the case of resource-scarce languages such as Moroccan Arabic, the building of NLP applications is limited due to the lack of these resources. To overcome this problem, we follow a rule-based approach to generate a Moroccan morphological vocabulary (MORV) which constitutes the first step addressing the problem of Moroccan morphological generation. MORV is designed and implemented based on two main components: On one hand, an MA lexicon and a list of fully annotated affixes and clitics that we have created specifically to ensure the generation process. On the other hand, a set of rules covering the concatenation and the orthographic adjustments of the generated words. Moreover, given a base form, MORV outputs more than 4.5 M Moroccan words with rich morphological features such as tense, gender, number, state, etc. We tested the coverage of MORV on texts collected from Moroccan social media and realized that it reaches a vocabulary coverage of 84% and a precision of 94%. This system is a benefit for building other NLP applications such as spell checking, morphological analysis, and machine translation.
- Subjects :
- Vocabulary
General Computer Science
Machine translation
Computer science
business.industry
media_common.quotation_subject
Concatenation
Spell
020206 networking & telecommunications
Rule-based system
02 engineering and technology
Lexicon
computer.software_genre
Set (abstract data type)
0202 electrical engineering, electronic engineering, information engineering
020201 artificial intelligence & image processing
State (computer science)
Artificial intelligence
business
computer
Natural language processing
media_common
Subjects
Details
- ISSN :
- 13191578
- Volume :
- 34
- Database :
- OpenAIRE
- Journal :
- Journal of King Saud University - Computer and Information Sciences
- Accession number :
- edsair.doi...........d8a322fcb7519b0e95b880ab168da65a
- Full Text :
- https://doi.org/10.1016/j.jksuci.2021.02.013