1. Automatic speech recognition of poor quality audio using generative adversarial networks
- Author
-
Heymans, Walter, Davel, M.H., and 23607955 - Davel Marelie Hattingh (Supervisor)
- Subjects
Generative adversarial networks ,Call centre audio ,WAV49 encoding ,Automatic speech recognition ,Multi-style training - Abstract
MEng (Computer and Electronic Engineering), North-West University, Potchefstroom Campus In this study, we investigate the use of generative adversarial networks (GANs) to improve speech recognition performance of poor quality audio obtained from a real-world source. A GAN is developed to transform acoustic features of noisy audio prior to downstream acoustic modelling. The system utilises a baseline acoustic model trained on good quality data to improve the performance on mismatched data. This is achieved without requiring manual creation of parallel datasets. The practical relevance of the GAN is realised when a strong commercial-grade speech recognition system { which has already been optimised for a given set of conditions { is required to decode new mismatched data. The GAN can then act as a front-end to the existing system. We compare the GAN-based front-end to multi-style training (MTR) on three datasets in a controlled environment. The GAN system is much faster to train than a comparable MTR system with similar performance. The developed GAN is applied to a South African call centre dataset and achieves consistent improvements over a baseline model. Therefore, this provides a practical approach to improve ASR systems in mismatched environments. Masters
- Published
- 2022