1. Log-normal distribution in acoustic linguistic units
- Author
-
Universitat Politècnica de Catalunya. Institut de Ciències de l'Educació, Universitat Politècnica de Catalunya. LARCA - Laboratori d'Algorísmia Relacional, Complexitat i Aprenentatge, González Torre, Iván, Lacasa, Lucas, Kello, Christopher, Luque Serrano, Bartolome, Hernández Fernández, Antonio, Universitat Politècnica de Catalunya. Institut de Ciències de l'Educació, Universitat Politècnica de Catalunya. LARCA - Laboratori d'Algorísmia Relacional, Complexitat i Aprenentatge, González Torre, Iván, Lacasa, Lucas, Kello, Christopher, Luque Serrano, Bartolome, and Hernández Fernández, Antonio
- Abstract
In this work we verify with accuracy that acoustically transcribed durations of linguistic units at several scales (phonemes, words and Breath Groups) comply with log-normal distribution. To do this we have used a very well-known Corpus which contains conversational speech by native English speakers gathering approximately 3•10^5 words with time-aligned phonetic labels. Secondly, we explain this log-normal distribution using a new model: a Non-interacting Cascade Approach (NICA) model. This NICA model can explain the emergence of Lognormal distributions across linguistic levels (words, Breathe Group) solely based on the assumption that phoneme durations are also Lognormal. As we will see, we find an extremely good quantitative agreement between the NICA and the experimental results of the duration distribution for the case of phonemes and words, but such agreement is less spectacular in the case of Breath Groups. Finally, we discuss our results and justify our recommendation to work with medians instead of with mean values (that assumes Gaussian distribution) to avoid biases and erroneous conclusions in statistical learning studies based on acoustic elements with long-tailed distributions., Postprint (published version)
- Published
- 2019