Back to Search Start Over

Toward the Adoption of Explainable Pre-Trained Large Language Models for Classifying Human-Written and AI-Generated Sentences.

Authors :
Petrillo, Luca
Martinelli, Fabio
Santone, Antonella
Mercaldo, Francesco
Source :
Electronics (2079-9292); Oct2024, Vol. 13 Issue 20, p4057, 32p
Publication Year :
2024

Abstract

Pre-trained large language models have demonstrated impressive text generation capabilities, including understanding, writing, and performing many tasks in natural language. Moreover, with time and improvements in training and text generation techniques, these models are proving efficient at generating increasingly human-like content. However, they can also be modified to generate persuasive, contextual content weaponized for malicious purposes, including disinformation and novel social engineering attacks. In this paper, we present a study on identifying human- and AI-generated content using different models. Precisely, we fine-tune different models belonging to the BERT family, an open-source version of the GPT model, ELECTRA, and XLNet, and then perform a text classification task using two different labeled datasets—the first one consisting of 25,000 sentences generated by both AI and humans and the second comprising 22,929 abstracts that are ChatGPT-generated and written by humans. Furthermore, we perform an additional phase where we submit 20 sentences generated by ChatGPT and 20 sentences randomly extracted from Wikipedia to our fine-tuned models to verify the efficiency and robustness of the latter. In order to understand the prediction of the models, we performed an explainability phase using two sentences: one generated by the AI and one written by a human. We leveraged the integrated gradients and token importance techniques, analyzing the words and subwords of the two sentences. As a result of the first experiment, we achieved an average accuracy of 99%, precision of 98%, recall of 99%, and F1-score of 99%. For the second experiment, we reached an average accuracy of 51%, precision of 50%, recall of 52%, and F1-score of 51%. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
20799292
Volume :
13
Issue :
20
Database :
Complementary Index
Journal :
Electronics (2079-9292)
Publication Type :
Academic Journal
Accession number :
180557515
Full Text :
https://doi.org/10.3390/electronics13204057