1. Participatory Research for Low-resourced Machine Translation: A Case Study in African Languages
- Author
-
Blessing Itoro Bassey, Ghollah Kioko, Masabata Mokgesi-Selinga, Mofe Adeyemi, Musie Meressa, Julia Kreutzer, Herman Kamper, Rubungo Andre Niyongabo, Chris Chinenye Emezue, Arshath Ramkilowan, Taiwo Fagbohungbe, Timi E. Fasubaa, Hady Elsahar, Salomey Osei, Daniel Whitenack, Tajudeen Kolawole, Ignatius Ezeani, Shamsuddeen Hassan Muhammad, Kathleen Siminyu, Jamiil Toure Ali, Adewale Akinfaderin, Tshinondiwa Matsila, Bonaventure F. P. Dossou, Vukosi Marivate, Wilhelmina Nekoto, Idris Abdulkabir Dangana, Iroro Orife, Lawrence Okegbemi, Espoir Murhabazi, Salomon Kabongo, Orevaoghene Ahia, Alp Öktem, Ricky Macharm, Elan van Biljon, Jade Abbott, Kolawole Tajudeen, Blessing Sibanda, Perez Ogayo, Solomon Oluwole Akinola, Kelechi Ogueji, Christopher Onyefuluchi, Abdallah Bashir, Ayodele Olabiyi, Sackey Freshia, Kevin Degila, Jason Webster, Goodness Duru, and Laura Martinus
- Subjects
FOS: Computer and information sciences ,Computer Science - Machine Learning ,Computer Science - Computation and Language ,Machine translation ,Computer Science - Artificial Intelligence ,Process (engineering) ,Computer science ,Languages of Africa ,Participatory action research ,02 engineering and technology ,computer.software_genre ,Data science ,Machine Learning (cs.LG) ,Task (project management) ,Focus (linguistics) ,03 medical and health sciences ,Artificial Intelligence (cs.AI) ,0302 clinical medicine ,030221 ophthalmology & optometry ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Computation and Language (cs.CL) ,computer - Abstract
Research in NLP lacks geographic diversity, and the question of how NLP can be scaled to low-resourced languages has not yet been adequately solved. "Low-resourced"-ness is a complex problem going beyond data availability and reflects systemic problems in society. In this paper, we focus on the task of Machine Translation (MT), that plays a crucial role for information accessibility and communication worldwide. Despite immense improvements in MT over the past decade, MT is centered around a few high-resourced languages. As MT researchers cannot solve the problem of low-resourcedness alone, we propose participatory research as a means to involve all necessary agents required in the MT development process. We demonstrate the feasibility and scalability of participatory research with a case study on MT for African languages. Its implementation leads to a collection of novel translation datasets, MT benchmarks for over 30 languages, with human evaluations for a third of them, and enables participants without formal training to make a unique scientific contribution. Benchmarks, models, data, code, and evaluation results are released under https://github.com/masakhane-io/masakhane-mt., Comment: Findings of EMNLP 2020; updated benchmarks
- Published
- 2020
- Full Text
- View/download PDF