Skip to search
Skip to main content
Back to Search
Start Over
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
Authors :
Workshop, BigScience Scao, Teven Le Fan, Angela Akiki, Christopher Pavlick, Ellie Ilić, Suzana Hesslow, Daniel Castagné, Roman Luccioni, Alexandra Sasha Yvon, François Gallé, Matthias Tow, Jonathan Rush, Alexander M. Biderman, Stella Webson, Albert Ammanamanchi, Pawan Sasanka Wang, Thomas Sagot, Benoît Muennighoff, Niklas del Moral, Albert Villanova Ruwase, Olatunji Bawden, Rachel Bekman, Stas McMillan-Major, Angelina Beltagy, Iz Nguyen, Huu Saulnier, Lucile Tan, Samson Suarez, Pedro Ortiz Sanh, Victor Laurençon, Hugo Jernite, Yacine Launay, Julien Mitchell, Margaret Raffel, Colin Gokaslan, Aaron Simhi, Adi Soroa, Aitor Aji, Alham Fikri Alfassy, Amit Rogers, Anna Nitzav, Ariel Kreisberg Xu, Canwen Mou, Chenghao Emezue, Chris Klamm, Christopher Leong, Colin van Strien, Daniel Adelani, David Ifeoluwa Radev, Dragomir Ponferrada, Eduardo González Levkovizh, Efrat Kim, Ethan Natan, Eyal Bar De Toni, Francesco Dupont, Gérard Kruszewski, Germán Pistilli, Giada Elsahar, Hady Benyamina, Hamza Tran, Hieu Yu, Ian Abdulmumin, Idris Johnson, Isaac Gonzalez-Dios, Itziar de la Rosa, Javier Chim, Jenny Dodge, Jesse Zhu, Jian Chang, Jonathan Frohberg, Jörg Tobing, Joseph Bhattacharjee, Joydeep Almubarak, Khalid Chen, Kimbo Lo, Kyle Von Werra, Leandro Weber, Leon Phan, Long allal, Loubna Ben Tanguy, Ludovic Dey, Manan Muñoz, Manuel Romero Masoud, Maraim Grandury, María Šaško, Mario Huang, Max Coavoux, Maximin Singh, Mayank Jiang, Mike Tian-Jian Vu, Minh Chien Jauhar, Mohammad A. Ghaleb, Mustafa Subramani, Nishant Kassner, Nora Khamis, Nurulaqilla Nguyen, Olivier Espejel, Omar de Gibert, Ona Villegas, Paulo Henderson, Peter Colombo, Pierre Amuok, Priscilla Lhoest, Quentin Harliman, Rheza Bommasani, Rishi López, Roberto Luis Ribeiro, Rui Osei, Salomey Pyysalo, Sampo Nagel, Sebastian Bose, Shamik Muhammad, Shamsuddeen Hassan Sharma, Shanya Longpre, Shayne Nikpoor, Somaieh Silberberg, Stanislav Pai, Suhas Zink, Sydney Torrent, Tiago Timponi Schick, Timo Thrush, Tristan Danchev, Valentin Nikoulina, Vassilina Laippala, Veronika Lepercq, Violette Prabhu, Vrinda Alyafeai, Zaid Talat, Zeerak Raja, Arun Heinzerling, Benjamin Si, Chenglei Taşar, Davut Emre Salesky, Elizabeth Mielke, Sabrina J. Lee, Wilson Y. Sharma, Abheesht Santilli, Andrea Chaffin, Antoine Stiegler, Arnaud Datta, Debajyoti Szczechla, Eliza Chhablani, Gunjan Wang, Han Pandey, Harshit Strobelt, Hendrik Fries, Jason Alan Rozen, Jos Gao, Leo Sutawika, Lintang Bari, M Saiful Al-shaibani, Maged S. Manica, Matteo Nayak, Nihal Teehan, Ryan Albanie, Samuel Shen, Sheng Ben-David, Srulik Bach, Stephen H. Kim, Taewoon Bers, Tali Fevry, Thibault Neeraj, Trishala Thakker, Urmish Raunak, Vikas Tang, Xiangru Yong, Zheng-Xin Sun, Zhiqing Brody, Shaked Uri, Yallow Tojarieh, Hadar Roberts, Adam Chung, Hyung Won Tae, Jaesung Phang, Jason Press, Ofir Li, Conglong Narayanan, Deepak Bourfoune, Hatim Casper, Jared Rasley, Jeff Ryabinin, Max Mishra, Mayank Zhang, Minjia Shoeybi, Mohammad Peyrounette, Myriam Patry, Nicolas Tazi, Nouamane Sanseviero, Omar von Platen, Patrick Cornette, Pierre Lavallée, Pierre François Lacroix, Rémi Rajbhandari, Samyam Gandhi, Sanchit Smith, Shaden Requena, Stéphane Patil, Suraj Dettmers, Tim Baruwa, Ahmed Singh, Amanpreet Cheveleva, Anastasia Ligozat, Anne-Laure Subramonian, Arjun Névéol, Aurélie Lovering, Charles Garrette, Dan Tunuguntla, Deepak Reiter, Ehud Taktasheva, Ekaterina Voloshina, Ekaterina Bogdanov, Eli Winata, Genta Indra Schoelkopf, Hailey Kalo, Jan-Christoph Novikova, Jekaterina Forde, Jessica Zosa Clive, Jordan Kasai, Jungo Kawamura, Ken Hazan, Liam Carpuat, Marine Clinciu, Miruna Kim, Najoung Cheng, Newton Serikov, Oleg Antverg, Omer van der Wal, Oskar Zhang, Rui Zhang, Ruochen Gehrmann, Sebastian Mirkin, Shachar Pais, Shani Shavrina, Tatiana Scialom, Thomas Yun, Tian Limisiewicz, Tomasz Rieser, Verena Protasov, Vitaly Mikhailov, Vladislav Pruksachatkun, Yada Belinkov, Yonatan Bamberger, Zachary Kasner, Zdeněk Rueda, Alice Pestana, Amanda Feizpour, Amir Khan, Ammar Faranak, Amy Santos, Ana Hevia, Anthony Unldreaj, Antigona Aghagol, Arash Abdollahi, Arezoo Tammour, Aycha HajiHosseini, Azadeh Behroozi, Bahareh Ajibade, Benjamin Saxena, Bharat Ferrandis, Carlos Muñoz McDuff, Daniel Contractor, Danish Lansky, David David, Davis Kiela, Douwe Nguyen, Duong A. Tan, Edward Baylor, Emi Ozoani, Ezinwanne Mirza, Fatima Ononiwu, Frankline Rezanejad, Habib Jones, Hessie Bhattacharya, Indrani Solaiman, Irene Sedenko, Irina Nejadgholi, Isar Passmore, Jesse Seltzer, Josh Sanz, Julio Bonis Dutra, Livia Samagaio, Mairon Elbadri, Maraim Mieskes, Margot Gerchick, Marissa Akinlolu, Martha McKenna, Michael Qiu, Mike Ghauri, Muhammed Burynok, Mykola Abrar, Nafis Rajani, Nazneen Elkott, Nour Fahmy, Nour Samuel, Olanrewaju An, Ran Kromann, Rasmus Hao, Ryan Alizadeh, Samira Shubber, Sarmad Wang, Silas Roy, Sourav Viguier, Sylvain Le, Thanh Oyebade, Tobi Le, Trieu Yang, Yoyo Nguyen, Zach Kashyap, Abhinav Ramesh Palasciano, Alfredo Callahan, Alison Shukla, Anima Miranda-Escalada, Antonio Singh, Ayush Beilharz, Benjamin Wang, Bo Brito, Caio Zhou, Chenxi Jain, Chirag Xu, Chuxin Fourrier, Clémentine Periñán, Daniel León Molano, Daniel Yu, Dian Manjavacas, Enrique Barth, Fabio Fuhrimann, Florian Altay, Gabriel Bayrak, Giyaseddin Burns, Gully Vrabec, Helena U. Bello, Imane Dash, Ishani Kang, Jihyun Giorgi, John Golde, Jonas Posada, Jose David Sivaraman, Karthik Rangasai Bulchandani, Lokesh Liu, Lu Shinzato, Luisa de Bykhovetz, Madeleine Hahn Takeuchi, Maiko Pàmies, Marc Castillo, Maria A Nezhurina, Marianna Sänger, Mario Samwald, Matthias Cullan, Michael Weinberg, Michael De Wolf, Michiel Mihaljcic, Mina Liu, Minna Freidank, Moritz Kang, Myungsun Seelam, Natasha Dahlberg, Nathan Broad, Nicholas Michio Muellner, Nikolaus Fung, Pascale Haller, Patrick Chandrasekhar, Ramya Eisenberg, Renata Martin, Robert Canalli, Rodrigo Su, Rosaline Su, Ruisi Cahyawijaya, Samuel Garda, Samuele Deshmukh, Shlok S Mishra, Shubhanshu Kiblawi, Sid Ott, Simon Sang-aroonsiri, Sinee Kumar, Srishti Schweter, Stefan Bharati, Sushil Laud, Tanmay Gigant, Théo Kainuma, Tomoya Kusa, Wojciech Labrak, Yanis Bajaj, Yash Shailesh Venkatraman, Yash Xu, Yifan Xu, Yingxin Xu, Yu Tan, Zhe Xie, Zhongli Ye, Zifan Bras, Mathilde Belkada, Younes Wolf, Thomas Natural Language Processing : representations, inference and semantics (SYNALP) Department of Natural Language Processing & Knowledge Discovery (LORIA - NLPKD) Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA) Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA) Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS) Automatic Language Modelling and ANAlysis & Computational Humanities (ALMAnaCH) Inria de Paris Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria) Laboratoire Interdisciplinaire des Sciences du Numérique (LISN) Institut National de Recherche en Informatique et en Automatique (Inria)-CentraleSupélec-Université Paris-Saclay-Centre National de la Recherche Scientifique (CNRS) Traitement du Langage Parlé (TLP ) Institut National de Recherche en Informatique et en Automatique (Inria)-CentraleSupélec-Université Paris-Saclay-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-CentraleSupélec-Université Paris-Saclay-Centre National de la Recherche Scientifique (CNRS)-Sciences et Technologies des Langues (STL) Institut National de Recherche en Informatique et en Automatique (Inria)-CentraleSupélec-Université Paris-Saclay-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-CentraleSupélec-Université Paris-Saclay-Centre National de la Recherche Scientifique (CNRS) Data and Web Science Group [Mannheim Univ] (DWS) Universität Mannheim Information, Langue Ecrite et Signée (ILES) Avignon Université (AU) Laboratoire Informatique d'Avignon (LIA) Avignon Université (AU)-Centre d'Enseignement et de Recherche en Informatique - CERI Zenidoc
Publication Year :
2022
Publisher :
arXiv, 2022.
Abstract
Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License.