Start Over

Languages Worldwide and the World Wide Web: Crowdsourcing on the Internet to Explore Linguistic Theories

Authors :: Hutin, Mathilde
Allassonnière-Tang, Marc
Laboratoire Interdisciplinaire des Sciences du Numérique (LISN)
Institut National de Recherche en Informatique et en Automatique (Inria)-CentraleSupélec-Université Paris-Saclay-Centre National de la Recherche Scientifique (CNRS)
Traitement du Langage Parlé (TLP )
Institut National de Recherche en Informatique et en Automatique (Inria)-CentraleSupélec-Université Paris-Saclay-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-CentraleSupélec-Université Paris-Saclay-Centre National de la Recherche Scientifique (CNRS)-Sciences et Technologies des Langues (STL)
Institut National de Recherche en Informatique et en Automatique (Inria)-CentraleSupélec-Université Paris-Saclay-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-CentraleSupélec-Université Paris-Saclay-Centre National de la Recherche Scientifique (CNRS)
Éco-Anthropologie (EA)
Muséum national d'Histoire naturelle (MNHN)-Centre National de la Recherche Scientifique (CNRS)
Source :: 2022, 978-951-39-9450-1
Publication Year :: 2022
Publisher :: HAL CCSD, 2022.
Abstract: Vocal languages across the world are estimated to be approximately 6000, yet only a handful of them are well-resourced, thus limiting typological investigations, i.e., language-comparison studies aiming at understanding universal trends in language. Crowd-sourced data could participate in creating homogenous multilingual corpora and therefore provide a revolutionary tool to give researchers access to large amounts of data in rare or remote languages. Yet crowd-sourced data are usually recorded with non-professional tools in non-silent environments, which represents a challenge to anyone wishing to use them for phonetic research. In this paper, we show how crowd-sourced data can participate in academic research by using audio files from Lingua Libre, Wikimedia France’s open-access linguistic library, to test the Inventory Size Hypothesis. This hypothesis suggests that the more phonological vowel categories a language has, the less internal phonetic variation vowels will display. The platform allows us to investigate the acoustic measurements of the three cardinal vowels /a/, /i/ and /u/ in 7 less-resourced languages with various numbers of vowel categories. Our results replicate the results of previous literature, which shows that our methodology is promising. Lingua Libre thus successfully allows to investigate a scientific question with theoretical implications for larger models of communication, and to bridge the gap between well and less-resourced languages in an inclusive, homogeneous data set of the world’s languages.

Subjects :: [SCCO.LING]Cognitive science/Linguistics

Details

Language :: English
ISBN :: 978-951-39-9450-1
ISBNs :: 9789513994501
Database :: OpenAIRE
Journal :: 2022, 978-951-39-9450-1
Accession number :: edsair.od.......165..760e09d635ac340ef9eea848c013362e

Tools

Email
Cite

Printer

Authors Abstract Subjects Details

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Languages Worldwide and the World Wide Web: Crowdsourcing on the Internet to Explore Linguistic Theories

Abstract

Subjects

Details

Tools

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Languages Worldwide and the World Wide Web: Crowdsourcing on the Internet to Explore Linguistic Theories

Abstract

Subjects

Details

Tools

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources