Back to Search Start Over

A FEATURE EXTRACTION BASED IMPROVED SENTIMENT ANALYSIS ON APACHE SPARK FOR REAL-TIME TWITTER DATA.

Authors :
KANUNGO, PIYUSH
SINGH, HARI
Source :
Scalable Computing: Practice & Experience; Dec2023, Vol. 24 Issue 4, p847-855, 9p
Publication Year :
2023

Abstract

This paper aims to improve the accuracy of sentiment analysis on Apache Spark for a real-time general twitter data. A lot of works exist on sentiment analysis on offline or stored twitter data that uses several classification algorithms on relevant features extracted using well-known feature extraction methodologies on pre-processed text data. However, not much works exist for sentiment analysis of real-time twitter data and especially for the generic data on big data processing platforms such as Apache Spark. This paper proposes a real-time sentiment analysis for generic twitter data through Apache Spark using six classification algorithms on N-gram and Term Frequency - Inverse Document Frequency (TF-IDF) feature extraction methodologies on the pre-processed data. An exhaustive comparison is done using Logistic Regression (LR), Multinomial Naive Bayes (MNB), Random Forest Classfier(RFC), Support Vector Machine (SVM), K-Nearest Neighbour (K-NN), and Decision Tree (DT) classification algorithms. It is observed that the trigram feature extraction method performs the best on LR and SVM and the RFC results are also comparable on the considered general tweets data. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
18951767
Volume :
24
Issue :
4
Database :
Complementary Index
Journal :
Scalable Computing: Practice & Experience
Publication Type :
Academic Journal
Accession number :
173712342
Full Text :
https://doi.org/10.12694/scpe.v24i4.2343