1. Arabic light-based stemmer using new rules
- Author
-
Hamood Alshalabi, Fatima N. Al-Aswadi, Nazlia Omar, Kamal Ali Alezabi, and Sabrina Tiun
- Subjects
Structure (mathematical logic) ,General Computer Science ,Arabic ,business.industry ,Computer science ,computer.software_genre ,language.human_language ,Prefix ,Infix ,Morpheme ,language ,Artificial intelligence ,business ,computer ,Word length ,Natural language processing - Abstract
Superior stemming algorithms aid significantly in many natural language processing (NLP) applications such as information retrieval. Arabic light-based stemmer is one of the most important stemming algorithms. However, partially due to the highly inflected and complexity of Arabic language morphological structure, most of the existing Arabic light-based stemmer algorithms eliminate a few numbers of suffixes and prefixes or both in the process of recognising the infix patterns to determine roots. The elimination of suffixes and prefixes leads to many inefficient results. Hence, this study aims to develop an improved light-based algorithm of the Arabic stemmer by proposing an appropriate suffixes and prefixes list, which is supported by rules according to word length (without using a morpheme or patterns on a stem). Our improved Dlight Arabic stemmer focuses on determining and removing the infix patterns under many rules on length-words and according to a specific order of the stages of the stemming to extract the double, triple and quadruple roots from long and short Arabic words. To evaluate our proposed light-based Arabic stemmer, we compared our stemmer against existing Arabic stemmers, namely Light10, Condlight and ARLST. The experimental results showed the proposed Develop Arabic Light-Based Stemmer (Dlight) obtained the best performance with 68% of F-measure, while the other three Arabic stemmers yield slightly lower F-measure. Finally, establishing an appropriate list of suffixes and prefixes with word length rules to stem Arabic words can improve the performance of a light-based Arabic stemmer.
- Published
- 2022
- Full Text
- View/download PDF