Back to Search
Start Over
Classification of tourism destination review texts based on sentiment using k-nearest neighbor with information gain feature selection.
- Source :
-
AIP Conference Proceedings . 2024, Vol. 3176 Issue 1, p1-8. 8p. - Publication Year :
- 2024
-
Abstract
- Sentiment analysis of tourism destination reviews can be used as feedback to managers to improve the quality of tourism services. Many methods have been used to classify the review text based on its sentiment. k-Nearest Neighbor (KNN) is a classification method that is widely used in sentiment analysis. This simple approach is capable of providing very high accuracy. The main drawback of KNN is the long computing time, so by default it is not recommended for big data computing. This article explains how the KNN method is combined with Information Gain (IG) feature selection to select only the best terms (words) in the dataset to be involved in computing. This research analyzes the review text of a tourism destination on Madura Island which was downloaded from Google Map. This review was preprocessed using case-folding, cleansing, normalization, tokenization, stop-word removal, and stemming techniques. Each term is given a weight using TF-IDF, then feature restrictions are carried out using IG. Testing shows that the KNN classifier without involving IG provides the best accuracy of 98.4% (only oversampling), namely when the k value = 1, whereas when KNN is combined with IG the best accuracy is 97.6 (oversampling plus feature selection) when the k value is set to 3 and the threshold is 0.0008. The combination of KNN and IG is recommended to be applied to classify large-scale review texts based on sentiment. Reducing the number of features can shorten computing time while maintaining the accuracy of the classifier. [ABSTRACT FROM AUTHOR]
Details
- Language :
- English
- ISSN :
- 0094243X
- Volume :
- 3176
- Issue :
- 1
- Database :
- Academic Search Index
- Journal :
- AIP Conference Proceedings
- Publication Type :
- Conference
- Accession number :
- 178717858
- Full Text :
- https://doi.org/10.1063/5.0222729