Start Over

Data augmentation strategies to improve text classification: a use case in smart cities.

Authors :: Bencke, Luciana
Moreira, Viviane Pereira
Source :: Language Resources & Evaluation. Jun2024, Vol. 58 Issue 2, p659-694. 36p.
Publication Year :: 2024
Abstract: Text classification is a very common and important task in Natural Language Processing. In many domains and real-world settings, a few labeled instances are the only resource available to train classifiers. Models trained on small datasets tend to overfit and produce inaccurate results – Data augmentation (DA) techniques come as an alternative to minimize this problem. DA generates synthetic instances that can be fed to the classification algorithm during training. In this article, we explore a variety of DA methods, including back translation, paraphrasing, and text generation. We assess the impact of the DA methods over simulated low-data scenarios using well-known public datasets in English with classifiers built fine-tuning BERT models. We describe the means to adapt these DA methods to augment a small Portuguese dataset containing tweets labeled with smart city dimensions (e.g., transportation, energy, water, etc.). Our experiments showed that some classes were noticeably improved by DA – with an improvement of 43% in terms of F1 compared to the baseline with no augmentation. In a qualitative analysis, we observed that the DA methods were able to preserve the label but failed to preserve the semantics in some cases and that generative models were able to produce high-quality synthetic instances. [ABSTRACT FROM AUTHOR]

Subjects :: *DATA augmentation
*SMART cities
*NATURAL language processing
*LANGUAGE models
*CLASSIFICATION algorithms

Details

Language :: English
ISSN :: 1574020X
Volume :: 58
Issue :: 2
Database :: Academic Search Index
Journal :: Language Resources & Evaluation
Publication Type :: Academic Journal
Accession number :: 178064686
Full Text :: https://doi.org/10.1007/s10579-023-09685-w

Full Text Access

View/download PDF

Tools

Email
Cite

Printer

Authors Abstract Subjects Details

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Data augmentation strategies to improve text classification: a use case in smart cities.

Abstract

Subjects

Details

Tools

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Data augmentation strategies to improve text classification: a use case in smart cities.

Abstract

Subjects

Details

Tools

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources