Evaluation of the efficiency of state-of-the-art Speech Recognition engines.

Authors :: Trabelsi, Asma
Warichet, Sébastien
Aajaoun, Yassine
Soussilane, Séverine
Source :: Procedia Computer Science; 2022, Vol. 207, p2242-2252, 11p
Publication Year :: 2022
Abstract: Speech Recognition is one of the several Artificial Intelligence applications. It helps us converting spoken words into text. It can be part of various daily use cases in order to deal with accessibility. Google Assistant and Amazon's Alexa are in the top of list of the well-known Speech Recognition tools. European companies cannot use these solutions as they should guarantee data sovereignty. Another important point is that these mentioned solutions are not customized. So that, it is not possible to deal with new accents or new vocabularies. To cope with these problems, one can either use European Automatic Speech Recognition (ASR) solutions or build his own personalized models using well-known open-source tools like Deep Speech or Kaldi. Choosing the best solution between both, Kaldi and DeepSpeech, is an important task. The criteria for judging the finest method are the Accuracy and the Inference Time. In this paper, we make theoretical and experimental study between DeepSpeech and Kaldi. Also, Vosk and LinTO, open-source solutions build in top of Kaldi, will be included in the comparison study. [ABSTRACT FROM AUTHOR]