Back to Search Start Over

Effect of stopwords in Indian language IR.

Authors :
Sahu, Siba Sankar
Pal, Sukomal
Source :
Sādhanā: Academy Proceedings in Engineering Sciences. Mar2022, Vol. 47 Issue 1, p1-17. 17p.
Publication Year :
2022

Abstract

We explore and evaluate the effect of stopwords in retrieval performance of different Indian languages such as Marathi, Bengali, Gujarati and Sanskrit. The issue was investigated from three viewpoints. Is there any impact of non-corpus-based stopword removal on chosen Indian languages (if yes, to what extent)? Can we recommend, based on experiment, a number of stopwords for chosen Indian languages that are good enough from retrieval point of view? Is there any relationship of stopwords with average document length from retrieval perspective? It is observed that the stopword removal generally improves mean average precision (MAP) significantly compared with the case when it is not done. For each language, different lengths of the stopword list are explored and evaluated that lead to suggesting its optimal length. We also study the effect of stopwords on retrieval performance over document length. The effect of stopwords is generally found to be quite low in short documents compared with their long counterparts across the four Indian languages. [ABSTRACT FROM AUTHOR]

Subjects

Subjects :
*LANGUAGE & languages

Details

Language :
English
ISSN :
02562499
Volume :
47
Issue :
1
Database :
Academic Search Index
Journal :
Sādhanā: Academy Proceedings in Engineering Sciences
Publication Type :
Academic Journal
Accession number :
154814189
Full Text :
https://doi.org/10.1007/s12046-021-01731-z