Author: "Møller, Anders Giovanni" / Topic: computers and society (cs.cy) - Searchworks@Jio Institute Digital Library Search Results

1. Is a prompt and a few samples all you need? Using GPT-4 for data augmentation in low-resource classification tasks

Author: Møller, Anders Giovanni, Dalsgaard, Jacob Aarup, Pera, Arianna, and Aiello, Luca Maria
Subjects: FOS: Computer and information sciences, Computer Science - Computers and Society, Physics - Physics and Society, Computer Science - Computation and Language, Computers and Society (cs.CY), FOS: Physical sciences, Physics and Society (physics.soc-ph), Computation and Language (cs.CL)
Abstract: Obtaining and annotating data can be expensive and time-consuming, especially in complex, low-resource domains. We use GPT-4 and ChatGPT to augment small labeled datasets with synthetic data via simple prompts, in three different classification tasks with varying complexity. For each task, we randomly select a base sample of 500 texts to generate 5,000 new synthetic samples. We explore two augmentation strategies: one that preserves original label distribution and another that balances the distribution. Using a progressively larger training sample size, we train and evaluate a 110M parameter multilingual language model on the real and synthetic data separately. We also test GPT-4 and ChatGPT in a zero-shot setting on the test sets. We observe that GPT-4 and ChatGPT have strong zero-shot performance across all tasks. We find that data augmented with synthetic samples yields a good downstream performance, and particularly aids in low-resource settings, such as in identifying rare classes. Human-annotated data exhibits a strong predictive power, overtaking synthetic data in two out of the three tasks. This finding highlights the need for more complex prompts for synthetic datasets to consistently surpass human-generated ones., 12 pages, 4 figures, 4 tables
Published: 2023

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

1 results on '"Møller, Anders Giovanni"'

1. Is a prompt and a few samples all you need? Using GPT-4 for data augmentation in low-resource classification tasks

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Publication Year Range

Language

Database

1 results on '"Møller, Anders Giovanni"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources