Back to Search Start Over

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

Authors :
Workshop, BigScience
Scao, Teven Le
Fan, Angela
Akiki, Christopher
Pavlick, Ellie
Ilić, Suzana
Hesslow, Daniel
Castagné, Roman
Luccioni, Alexandra Sasha
Yvon, François
Gallé, Matthias
Tow, Jonathan
Rush, Alexander M.
Biderman, Stella
Webson, Albert
Ammanamanchi, Pawan Sasanka
Wang, Thomas
Sagot, Benoît
Muennighoff, Niklas
del Moral, Albert Villanova
Ruwase, Olatunji
Bawden, Rachel
Bekman, Stas
McMillan-Major, Angelina
Beltagy, Iz
Nguyen, Huu
Saulnier, Lucile
Tan, Samson
Suarez, Pedro Ortiz
Sanh, Victor
Laurençon, Hugo
Jernite, Yacine
Launay, Julien
Mitchell, Margaret
Raffel, Colin
Gokaslan, Aaron
Simhi, Adi
Soroa, Aitor
Aji, Alham Fikri
Alfassy, Amit
Rogers, Anna
Nitzav, Ariel Kreisberg
Xu, Canwen
Mou, Chenghao
Emezue, Chris
Klamm, Christopher
Leong, Colin
van Strien, Daniel
Adelani, David Ifeoluwa
Radev, Dragomir
Ponferrada, Eduardo González
Levkovizh, Efrat
Kim, Ethan
Natan, Eyal Bar
De Toni, Francesco
Dupont, Gérard
Kruszewski, Germán
Pistilli, Giada
Elsahar, Hady
Benyamina, Hamza
Tran, Hieu
Yu, Ian
Abdulmumin, Idris
Johnson, Isaac
Gonzalez-Dios, Itziar
de la Rosa, Javier
Chim, Jenny
Dodge, Jesse
Zhu, Jian
Chang, Jonathan
Frohberg, Jörg
Tobing, Joseph
Bhattacharjee, Joydeep
Almubarak, Khalid
Chen, Kimbo
Lo, Kyle
Von Werra, Leandro
Weber, Leon
Phan, Long
allal, Loubna Ben
Tanguy, Ludovic
Dey, Manan
Muñoz, Manuel Romero
Masoud, Maraim
Grandury, María
Šaško, Mario
Huang, Max
Coavoux, Maximin
Singh, Mayank
Jiang, Mike Tian-Jian
Vu, Minh Chien
Jauhar, Mohammad A.
Ghaleb, Mustafa
Subramani, Nishant
Kassner, Nora
Khamis, Nurulaqilla
Nguyen, Olivier
Espejel, Omar
de Gibert, Ona
Villegas, Paulo
Henderson, Peter
Colombo, Pierre
Amuok, Priscilla
Lhoest, Quentin
Harliman, Rheza
Bommasani, Rishi
López, Roberto Luis
Ribeiro, Rui
Osei, Salomey
Pyysalo, Sampo
Nagel, Sebastian
Bose, Shamik
Muhammad, Shamsuddeen Hassan
Sharma, Shanya
Longpre, Shayne
Nikpoor, Somaieh
Silberberg, Stanislav
Pai, Suhas
Zink, Sydney
Torrent, Tiago Timponi
Schick, Timo
Thrush, Tristan
Danchev, Valentin
Nikoulina, Vassilina
Laippala, Veronika
Lepercq, Violette
Prabhu, Vrinda
Alyafeai, Zaid
Talat, Zeerak
Raja, Arun
Heinzerling, Benjamin
Si, Chenglei
Taşar, Davut Emre
Salesky, Elizabeth
Mielke, Sabrina J.
Lee, Wilson Y.
Sharma, Abheesht
Santilli, Andrea
Chaffin, Antoine
Stiegler, Arnaud
Datta, Debajyoti
Szczechla, Eliza
Chhablani, Gunjan
Wang, Han
Pandey, Harshit
Strobelt, Hendrik
Fries, Jason Alan
Rozen, Jos
Gao, Leo
Sutawika, Lintang
Bari, M Saiful
Al-shaibani, Maged S.
Manica, Matteo
Nayak, Nihal
Teehan, Ryan
Albanie, Samuel
Shen, Sheng
Ben-David, Srulik
Bach, Stephen H.
Kim, Taewoon
Bers, Tali
Fevry, Thibault
Neeraj, Trishala
Thakker, Urmish
Raunak, Vikas
Tang, Xiangru
Yong, Zheng-Xin
Sun, Zhiqing
Brody, Shaked
Uri, Yallow
Tojarieh, Hadar
Roberts, Adam
Chung, Hyung Won
Tae, Jaesung
Phang, Jason
Press, Ofir
Li, Conglong
Narayanan, Deepak
Bourfoune, Hatim
Casper, Jared
Rasley, Jeff
Ryabinin, Max
Mishra, Mayank
Zhang, Minjia
Shoeybi, Mohammad
Peyrounette, Myriam
Patry, Nicolas
Tazi, Nouamane
Sanseviero, Omar
von Platen, Patrick
Cornette, Pierre
Lavallée, Pierre François
Lacroix, Rémi
Rajbhandari, Samyam
Gandhi, Sanchit
Smith, Shaden
Requena, Stéphane
Patil, Suraj
Dettmers, Tim
Baruwa, Ahmed
Singh, Amanpreet
Cheveleva, Anastasia
Ligozat, Anne-Laure
Subramonian, Arjun
Névéol, Aurélie
Lovering, Charles
Garrette, Dan
Tunuguntla, Deepak
Reiter, Ehud
Taktasheva, Ekaterina
Voloshina, Ekaterina
Bogdanov, Eli
Winata, Genta Indra
Schoelkopf, Hailey
Kalo, Jan-Christoph
Novikova, Jekaterina
Forde, Jessica Zosa
Clive, Jordan
Kasai, Jungo
Kawamura, Ken
Hazan, Liam
Carpuat, Marine
Clinciu, Miruna
Kim, Najoung
Cheng, Newton
Serikov, Oleg
Antverg, Omer
van der Wal, Oskar
Zhang, Rui
Zhang, Ruochen
Gehrmann, Sebastian
Mirkin, Shachar
Pais, Shani
Shavrina, Tatiana
Scialom, Thomas
Yun, Tian
Limisiewicz, Tomasz
Rieser, Verena
Protasov, Vitaly
Mikhailov, Vladislav
Pruksachatkun, Yada
Belinkov, Yonatan
Bamberger, Zachary
Kasner, Zdeněk
Rueda, Alice
Pestana, Amanda
Feizpour, Amir
Khan, Ammar
Faranak, Amy
Santos, Ana
Hevia, Anthony
Unldreaj, Antigona
Aghagol, Arash
Abdollahi, Arezoo
Tammour, Aycha
HajiHosseini, Azadeh
Behroozi, Bahareh
Ajibade, Benjamin
Saxena, Bharat
Ferrandis, Carlos Muñoz
McDuff, Daniel
Contractor, Danish
Lansky, David
David, Davis
Kiela, Douwe
Nguyen, Duong A.
Tan, Edward
Baylor, Emi
Ozoani, Ezinwanne
Mirza, Fatima
Ononiwu, Frankline
Rezanejad, Habib
Jones, Hessie
Bhattacharya, Indrani
Solaiman, Irene
Sedenko, Irina
Nejadgholi, Isar
Passmore, Jesse
Seltzer, Josh
Sanz, Julio Bonis
Dutra, Livia
Samagaio, Mairon
Elbadri, Maraim
Mieskes, Margot
Gerchick, Marissa
Akinlolu, Martha
McKenna, Michael
Qiu, Mike
Ghauri, Muhammed
Burynok, Mykola
Abrar, Nafis
Rajani, Nazneen
Elkott, Nour
Fahmy, Nour
Samuel, Olanrewaju
An, Ran
Kromann, Rasmus
Hao, Ryan
Alizadeh, Samira
Shubber, Sarmad
Wang, Silas
Roy, Sourav
Viguier, Sylvain
Le, Thanh
Oyebade, Tobi
Le, Trieu
Yang, Yoyo
Nguyen, Zach
Kashyap, Abhinav Ramesh
Palasciano, Alfredo
Callahan, Alison
Shukla, Anima
Miranda-Escalada, Antonio
Singh, Ayush
Beilharz, Benjamin
Wang, Bo
Brito, Caio
Zhou, Chenxi
Jain, Chirag
Xu, Chuxin
Fourrier, Clémentine
Periñán, Daniel León
Molano, Daniel
Yu, Dian
Manjavacas, Enrique
Barth, Fabio
Fuhrimann, Florian
Altay, Gabriel
Bayrak, Giyaseddin
Burns, Gully
Vrabec, Helena U.
Bello, Imane
Dash, Ishani
Kang, Jihyun
Giorgi, John
Golde, Jonas
Posada, Jose David
Sivaraman, Karthik Rangasai
Bulchandani, Lokesh
Liu, Lu
Shinzato, Luisa
de Bykhovetz, Madeleine Hahn
Takeuchi, Maiko
Pàmies, Marc
Castillo, Maria A
Nezhurina, Marianna
Sänger, Mario
Samwald, Matthias
Cullan, Michael
Weinberg, Michael
De Wolf, Michiel
Mihaljcic, Mina
Liu, Minna
Freidank, Moritz
Kang, Myungsun
Seelam, Natasha
Dahlberg, Nathan
Broad, Nicholas Michio
Muellner, Nikolaus
Fung, Pascale
Haller, Patrick
Chandrasekhar, Ramya
Eisenberg, Renata
Martin, Robert
Canalli, Rodrigo
Su, Rosaline
Su, Ruisi
Cahyawijaya, Samuel
Garda, Samuele
Deshmukh, Shlok S
Mishra, Shubhanshu
Kiblawi, Sid
Ott, Simon
Sang-aroonsiri, Sinee
Kumar, Srishti
Schweter, Stefan
Bharati, Sushil
Laud, Tanmay
Gigant, Théo
Kainuma, Tomoya
Kusa, Wojciech
Labrak, Yanis
Bajaj, Yash Shailesh
Venkatraman, Yash
Xu, Yifan
Xu, Yingxin
Xu, Yu
Tan, Zhe
Xie, Zhongli
Ye, Zifan
Bras, Mathilde
Belkada, Younes
Wolf, Thomas
Natural Language Processing : representations, inference and semantics (SYNALP)
Department of Natural Language Processing & Knowledge Discovery (LORIA - NLPKD)
Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA)
Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA)
Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)
Automatic Language Modelling and ANAlysis & Computational Humanities (ALMAnaCH)
Inria de Paris
Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)
Laboratoire Interdisciplinaire des Sciences du Numérique (LISN)
Institut National de Recherche en Informatique et en Automatique (Inria)-CentraleSupélec-Université Paris-Saclay-Centre National de la Recherche Scientifique (CNRS)
Traitement du Langage Parlé (TLP )
Institut National de Recherche en Informatique et en Automatique (Inria)-CentraleSupélec-Université Paris-Saclay-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-CentraleSupélec-Université Paris-Saclay-Centre National de la Recherche Scientifique (CNRS)-Sciences et Technologies des Langues (STL)
Institut National de Recherche en Informatique et en Automatique (Inria)-CentraleSupélec-Université Paris-Saclay-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-CentraleSupélec-Université Paris-Saclay-Centre National de la Recherche Scientifique (CNRS)
Data and Web Science Group [Mannheim Univ] (DWS)
Universität Mannheim
Information, Langue Ecrite et Signée (ILES)
Avignon Université (AU)
Laboratoire Informatique d'Avignon (LIA)
Avignon Université (AU)-Centre d'Enseignement et de Recherche en Informatique - CERI
Zenidoc
Publication Year :
2022
Publisher :
arXiv, 2022.

Abstract

Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License.

Details

Database :
OpenAIRE
Accession number :
edsair.doi.dedup.....5f2d58a6279ab32f1d7a245c041ed6a4
Full Text :
https://doi.org/10.48550/arxiv.2211.05100