1. Arabic spam tweets classification using deep learning.
- Author
-
Kaddoura, Sanaa, Alex, Suja A., Itani, Maher, Henno, Safaa, AlNashash, Asma, and Hemanth, D. Jude
- Subjects
- *
MACHINE learning , *DEEP learning , *SPAM email , *ONLINE social networks , *SUPPORT vector machines , *FEATURE extraction - Abstract
With the increased use of social network sites, such as Twitter, attackers exploit these platforms to spread counterfeit content. Such content can be fake advertisements or illegal content. Classifying such content is a challenging task, especially in Arabic. The Arabic language has a complex structure and makes classification tasks more difficult. This paper presents an approach to classifying Arabic tweets using classical machine learning (non-deep machine learning) and deep learning techniques. Tweets corpus were collected through Twitter API and labelled manually to get a reliable dataset. For an efficient classifier, feature extraction is applied to the corpus dataset. Then, two learning techniques are used for each feature extraction technique on the created dataset using N-gram models (uni-gram, bi-gram, and char-gram). The applied classical machine learning algorithms are support vector machines, neural networks, logistics regression, and naïve Bayes. Global vector (GloVe) and fastText learning models are utilised for the deep learning approaches. The Precision, Recall, and F1-score are the suggested performance measures calculated in this paper. Afterwards, the dataset is increased using the synthetic minority oversampling technique class to create a balanced dataset. After applying the classical machine learning models, the experimental results show that the neural network algorithm outperforms the other algorithms. Moreover, the GloVe outperforms the fastText model for the deep learning approach. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF