Back to Search Start Over

VocDoc, what happened to my voice? Towards automatically capturing vocal fatigue in the wild.

Authors :
Pokorny, Florian B.
Linke, Julian
Seddiki, Nico
Lohrmann, Simon
Gerstenberger, Claus
Haspl, Katja
Feiner, Marlies
Eyben, Florian
Hagmüller, Martin
Schuppler, Barbara
Kubin, Gernot
Gugatschka, Markus
Source :
Biomedical Signal Processing & Control; Feb2024:Part B, Vol. 88, pN.PAG-N.PAG, 1p
Publication Year :
2024

Abstract

Voice problems that arise during everyday vocal use can hardly be captured by standard outpatient voice assessments. In preparation for a digital health application to automatically assess longitudinal voice data 'in the wild' – the VocDoc , the aim of this paper was to study vocal fatigue from the speaker's perspective, the healthcare professional's perspective, and the 'machine's' perspective. We collected data of four voice healthy speakers completing a 90-min reading task. Every 10 min the speakers were asked about subjective voice characteristics. Then, we elaborated on the task of elapsed speaking time recognition: We carried out listening experiments with speech and language therapists and employed random forests on the basis of extracted acoustic features. We validated our models speaker-dependently and speaker-independently and analysed underlying feature importances. For an additional, clinical application-oriented scenario, we extended our dataset for lecture recordings of another two speakers. Self- and expert-assessments were not consistent. With mean F1 scores up to 0.78, automatic elapsed speaking time recognition worked reliably in the speaker-dependent scenario only. A small set of acoustic features – other than features previously reported to reflect vocal fatigue – was found to universally describe long-term variations of the voice. Vocal fatigue seems to have individual effects across different speakers. Machine learning has the potential to automatically detect and characterise vocal changes over time. Our study provides technical underpinnings for a future mobile solution to objectively capture pathological long-term voice variations in everyday life settings and make them clinically accessible. • A few acoustic features seem to universally describe vocal fatigue. • Vocal fatigue has rather individual effects across different speakers. • Machine learning has the potential to automatically detect effects of vocal fatigue. • A mobile app can capture clinically relevant long-term voice variations in the wild. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
17468094
Volume :
88
Database :
Supplemental Index
Journal :
Biomedical Signal Processing & Control
Publication Type :
Academic Journal
Accession number :
173629430
Full Text :
https://doi.org/10.1016/j.bspc.2023.105595