1. An Empirical Evaluation of the Zero-Shot, Few-Shot, and Traditional Fine-Tuning Based Pretrained Language Models for Sentiment Analysis in Software Engineering
- Author
-
Md Shafikuzzaman, Md Rakibul Islam, Alex C. Rolli, Sharmin Akhter, and Naeem Seliya
- Subjects
Sentiment analysis ,software engineering ,natural language processing ,pretrained language models ,GPT-4 ,zero-shot learning ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
Recent advances in natural language processing (NLP) have led to the development of revolutionized pretrained language models (PLMs) impacting various NLP tasks, including sentiment analysis in software engineering. Choosing the right PLMs is crucial to effectively leverage these advanced PLMs. This paper presents the largest comparative evaluation of the PLMs for sentiment analysis in software engineering. Specifically, the study initially quantifies the performances of four traditionally fine-tuned PLMs, five zero-shot PLMs including GPT-4 and GPT-3 models, and three few-shot PLMs on six domain-specific datasets. The performances of the selected PLMs are also compared against two software engineering domain-specific traditionally fine-tuned PLMs and two state-of-the-art tools. The quantitative analysis reveals varying strengths across the different PLM types. The traditionally fine-tuned domain-specific PLM seBERT achieves the best results in the larger datasets, whereas the few-shot PLMs, such as All-DistillRoBERTa, show the best performances in the smaller datasets. A qualitative error analysis with the help of an Explainable AI technique uncovers existing challenges faced by PLMs in sentiment analysis for software engineering. The comprehensive quantitative and qualitative experiments significantly enrich knowledge in sentiment analysis in software engineering through reproducible insights.
- Published
- 2024
- Full Text
- View/download PDF