Back to Search Start Over

Synthetic training set generation using text-to-audio models for environmental sound classification

Authors :
Ronchini, Francesca
Comanducci, Luca
Antonacci, Fabio
Publication Year :
2024

Abstract

In recent years, text-to-audio models have revolutionized the field of automatic audio generation. This paper investigates their application in generating synthetic datasets for training data-driven models. Specifically, this study analyzes the performance of two environmental sound classification systems trained with data generated from text-to-audio models. We considered three scenarios: a) augmenting the training dataset with data generated by text-to-audio models; b) using a mixed training dataset combining real and synthetic text-driven generated data; and c) using a training dataset composed entirely of synthetic audio. In all cases, the performance of the classification models was tested on real data. Results indicate that text-to-audio models are effective for dataset augmentation, with consistent performance when replacing a subset of the recorded dataset. However, the performance of the audio recognition models drops when relying entirely on generated audio.

Details

Database :
arXiv
Publication Type :
Report
Accession number :
edsarx.2403.17864
Document Type :
Working Paper