1. Research software for dictating text
- Author
-
Brunčić, Dalibor and Hocenski, Željko
- Subjects
automatsko raspoznavanje govora ,system testing ,automatic speech recognition ,točnost prepoznavanja ,povećavanje točnosti ,increasing accuracy ,testiranje sustava ,TEHNIČKE ZNANOSTI. Računarstvo. Procesno računarstvo ,recognition accuracy ,application programming ,izrada aplikacije ,TECHNICAL SCIENCES. Computing. Process Computing ,pogreška u prepoznavanju ,word error rate - Abstract
Dan je uvod u osnove tehnologije za automatsko raspoznavanje govora. Među navedenim ubrajaju se: praktične primjene sustava za automatsko raspoznavanje govora; prepreke i izazovi pri dizajniranju kvalitetnog sustava; princip rada i uloga pojedinih dijelova sustava; glavne karakteristike i metrike ASR sustava; algoritmi, tehnike i modeli korišteni kod automatskog raspoznavanja govora; te povijesni pregled razvoja ASR tehnologija. Prikazane su neke nove tehnologije koje još nisu implementirane u postojeće komercijalne programe, a znatno povećavaju točnost prepoznavanja riječi u govoru. Istražene su mogućnosti današnjih komercijalnih programa i maksimalna točnost prepoznavanja koju mogu ostvariti, koja se pokazala vrlo visokom (preko 90%). Prikazan je postupak izrade programa za prepoznavanje govora, konfiguracije ulaza, te finog podešavanja svih potrebnih postavki operativnog sustava. Izrađeni program podvrgnut je treniranju i detaljnom testiranju, kako bi se ustanovilo koji sve elementi i u kolikoj mjeri utječu na točnost prepoznavanja govora, te koja je najveća točnost prepoznavanja koju pritom može ostvariti. Doneseni su čvrsti zaključci i dana je osnova na temelju koje se može lakše razumjeti i promatrati daljnji razvoj tehnologija za automatsko raspoznavanje govora. Basics of speech recognition technology are introduced. Among the mentioned topics are: practical uses of speech recognition system; obstacles and challenges in quality system design; principle of operation and specific part roles; main characteristics and metrics of ASR system; algorithms, techniques and models used in automatic speech recognition; and historical overview of ASR technology development. This paper gives insight into some of the new technologies which have the ability of substantially increasing speech recognition accuracy, but have not yet been implemented into existing commercial programs. Abilities of today's commercial programs were explored, as well as maximum speech recognition accuracy they could achieve, which was very high by the way (over 90%). This paper also displays the procedure of making a speech recognition application, configuring input and fine tuning operating system settings. The application was trained and thoroughly tested in order to find out which elements affect recognition accuracy and in what measure, as well as what's the highest accuracy the application can accomplish. Solid conclusions are made and basis is obtained upon which one can easier understand and observe further speech recognition technology development.
- Published
- 2014