1. MasakhaNER: Named entity recognition for African languages
- Author
-
Julia Kreutzer, Ayodele Awokoya, Ignatius Ezeani, Rubungo Andre Niyongabo, Happy Buzaaba, Adewale Akinfaderin, Samuel Oyerinde, Stephen Mayhew, Emmanuel Anebi, Mofetoluwa Adeyemi, Kelechi Ogueji, Abdoulaye Diallo, Seid Muhie Yimam, Jade Abbott, Joyce Nakatumba-Nabende, Victor Akinode, Blessing Sibanda, Catherine Gitau, Chester Palen-Michel, Shamsuddeen Hassan Muhammad, Degaga Wolde, Graham Neubig, Tendai Marengereke, Paul Rayson, Derguene Mbaye, Eric Peter Wairagala, Daniel D'souza, Tosin P. Adewumi, Jonathan Mukiibi, Chris Chinenye Emezue, David Ifeoluwa Adelani, Shruti Rijhwani, Iroro Orife, Verrah Otiende, Maurice Katusiime, Yvonne Wambui, Dibora Gebreyohannes, Kelechi Nwaike, Salomey Osei, Chiamaka Chukwuneke, Henok Tilaye, Deborah Nabagereka, Thierno Ibrahima Diop, Orevaoghene Ahia, Jesujoba O. Alabi, Sebastian Ruder, Davis David, Mouhamadane Mboup, Samba Ngom, Tajuddeen R. Gwadabe, Bonaventure F. P. Dossou, Temilola Oloyede, Perez Ogayo, Clemencia Siro, Gerald Muriuki, Aremu Anuoluwapo, Nkiruka Odu, Tobius Saul Bateesa, Abdoulaye Faye, Israel Abebe Azime, Constantine Lignos, Saarland University [Saarbrücken], Masakhane NLP, Retro Rabbit, Carnegie Mellon University [Pittsburgh] (CMU), ProQuest, Google Research, Brandeis University, Université de Tsukuba = University of Tsukuba, DeepMind, DeepMind Technologies, Duolingo, African Institute for Mathematical Sciences (AIMS), University of Porto, Bayero University Kano (BUK), Technische Universität Munchen - Université Technique de Munich [Munich, Allemagne] (TUM), Makerere University [Kampala, Ouganda] (MAK), African Leadership University, University of Lagos, Max Planck Institute for Informatics [Saarbrücken], Universität Hamburg (UHH), University of Chinese Academy of Sciences [Beijing] (UCAS), Lancaster University, University of Electronic Science and Technology of China (UESTC), United States International University - Africa, Niger-Volta Language Technologies Institute, Luleå University of Technology (LUT), African University of Science and Technology (AUST), University of Ibadan, Namibia University of Science and Technology (NUST), InstaDeep, Jacobs University [Bremen], University of Waterloo [Waterloo], European Project: 825081,H2020,COMPRISE(2018), Technical University of Munich (TUM), DeepMind [London], Universidade do Porto = University of Porto, and University of Electronic Science and Technology of China [Chengdu] (UESTC)
- Subjects
FOS: Computer and information sciences ,Linguistics and Language ,Computer Science - Computation and Language ,Computer science ,business.industry ,Computer Science - Artificial Intelligence ,Communication ,Languages of Africa ,computer.software_genre ,Code (semiotics) ,[INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL] ,Computer Science Applications ,Human-Computer Interaction ,Artificial Intelligence (cs.AI) ,Named-entity recognition ,Artificial Intelligence ,Artificial intelligence ,Transfer of learning ,business ,computer ,Computation and Language (cs.CL) ,Natural language processing - Abstract
We take a step towards addressing the under-representation of the African continent in NLP research by creating the first large publicly available high-quality dataset for named entity recognition (NER) in ten African languages, bringing together a variety of stakeholders. We detail characteristics of the languages to help researchers understand the challenges that these languages pose for NER. We analyze our datasets and conduct an extensive empirical evaluation of state-of-the-art methods across both supervised and transfer learning settings. We release the data, code, and models in order to inspire future research on African NLP., Comment: Accepted to TACL 2021, pre-MIT Press publication version
- Published
- 2021
- Full Text
- View/download PDF