1. Automatic prediction of intelligible speaking rate for individuals with ALS from speech acoustic and articulatory samples
- Author
-
Jordan R. Green, Thomas F. Campbell, Andrea Bandini, Yana Yunusova, Prasanna V. Kothalkar, Myung Jong Kim, Daragh Heitzman, Jun Wang, and Beiming Cao
- Subjects
Adult ,Male ,Support Vector Machine ,Mean squared error ,Speech recognition ,0206 medical engineering ,02 engineering and technology ,Article ,Speech Acoustics ,Speech Disorders ,Language and Linguistics ,03 medical and health sciences ,Speech and Hearing ,Dysarthria ,0302 clinical medicine ,dysarthria ,Speech Production Measurement ,Tongue ,speech kinematics ,medicine ,Humans ,Tongue movement ,Aged ,Mathematics ,intelligible speaking rate ,Research and Theory ,Movement (music) ,amyotrophic lateral sclerosis ,machine learning ,support vector machine ,Amyotrophic Lateral Sclerosis ,Speech Intelligibility ,Middle Aged ,LPN and LVN ,020601 biomedical engineering ,Support vector machine ,medicine.anatomical_structure ,Otorhinolaryngology ,Female ,medicine.symptom ,Speech Recognition Software ,Words per minute ,030217 neurology & neurosurgery - Abstract
PURPOSE: This research aimed to automatically predict intelligible speaking rate for individuals with Amyotrophic Lateral Sclerosis (ALS) based on speech acoustic and articulatory samples. METHOD: Twelve participants with ALS and two normal subjects produced a total of 1,831 phrases. NDI Wave system was used to collect tongue and lip movement and acoustic data synchronously. A machine learning algorithm (i.e. support vector machine) was used to predict intelligible speaking rate (speech intelligibility × speaking rate) from acoustic and articulatory features of the recorded samples. RESULT: Acoustic, lip movement, and tongue movement information separately, yielded a R(2) of 0.652, 0.660, and 0.678 and a Root Mean Squared Error (RMSE) of 41.096, 41.166, and 39.855 words per minute (WPM) between the predicted and actual values, respectively. Combining acoustic, lip and tongue information we obtained the highest R(2) (0.712) and the lowest RMSE (37.562 WPM). CONCLUSION: The results revealed that our proposed analyses predicted the intelligible speaking rate of the participant with reasonably high accuracy by extracting the acoustic and/or articulatory features from one short speech sample. With further development, the analyses may be well-suited for clinical applications that require automatic speech severity prediction.
- Published
- 2018
- Full Text
- View/download PDF