Back to Search Start Over

Contextual semantic embeddings based on fine-tuned AraBERT model for Arabic text multi-class categorization.

Authors :
El-Alami, Fatima-zahra
Ouatik El Alaoui, Said
En Nahnahi, Noureddine
Source :
Journal of King Saud University - Computer & Information Sciences; Nov2022:Part A, Vol. 34 Issue 10, p8422-8428, 7p
Publication Year :
2022

Abstract

Despite that pre-trained word embedding models have advanced a wide range of natural language processing applications, they ignore the contextual information and meaning within the text. In this paper, we investigate the potential of the pre-trained Arabic BERT (Bidirectional Encoder Representations from Transformers) model to learn universal contextualized sentence representations aiming to showcase its usefulness for Arabic text Multi-class categorization. We propose to exploit the pre-trained AraBERT for contextual text representation learning in two different ways, transfer learning model and feature extractor. On the one hand, we employ the Arabic BERT (AraBERT) model after fine-tuning its parameters on the OSAC datasets to transfer its knowledge for the Arabic text categorization. On the other hand, we inquire into AraBERT performance, as a feature extractor model, by combining it with several classifiers, including CNN, LSTM, Bi-LSTM, MLP, and SVM. Finally, we conduct an exhaustive set of experiments comparing two BERT models, namely AraBERT and multilingual BERT. The findings show that the fine-tuned AraBERT model accomplishes state-of-the-art performance results and attains up to 99% in terms of F1-score and accuracy. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
13191578
Volume :
34
Issue :
10
Database :
Supplemental Index
Journal :
Journal of King Saud University - Computer & Information Sciences
Publication Type :
Academic Journal
Accession number :
160169850
Full Text :
https://doi.org/10.1016/j.jksuci.2021.02.005