Back to Search Start Over

Trends and developments in automatic speech recognition research.

Authors :
O'Shaughnessy, Douglas
Source :
Computer Speech & Language. Jan2024, Vol. 83, pN.PAG-N.PAG. 1p.
Publication Year :
2024

Abstract

This paper discusses how automatic speech recognition systems are and could be designed, in order to best exploit the discriminative information encoded in human speech. This contrasts with many recent machine learning approaches that apply general recognition architectures to signals to identify, with little concern for the nature of the input. The implicit assumption has often been that training can automatically discover the useful properties that exist in signals, with minimal manual intervention. These approaches may be suitable for some tasks such as image recognition, where the diversity of visual input is vast; e.g., an image may be any (natural or synthetic) scene that a camera views. We first examine what makes speech special, i.e., a natural signal from a complex tube, driven by a source that is quasi-periodic and/or noisy, aiming to communicate a wide variety of information, using the different vocal systems of human speakers. Then, we view how pertinent features are extracted from speech via efficient means, related to the objectives of communication. We see how to reliably and efficiently identify the different units of oral language. We learn from the history of attempts to do ASR, e.g., why they succeeded and how improved methods exploited the increasing availability of data and computer power (in particular, deep neural networks). Finally, we suggest ways to render ASR both more accurate and efficient. This work is aimed at both newcomers to ASR and experts, in terms of presenting issues broadly, but without mathematical or algorithmic details, which are readily found in the references. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
08852308
Volume :
83
Database :
Academic Search Index
Journal :
Computer Speech & Language
Publication Type :
Academic Journal
Accession number :
171991635
Full Text :
https://doi.org/10.1016/j.csl.2023.101538