Back to Search Start Over

Gemma 2: Improving Open Language Models at a Practical Size

Authors :
Gemma Team
Riviere, Morgane
Pathak, Shreya
Sessa, Pier Giuseppe
Hardin, Cassidy
Bhupatiraju, Surya
Hussenot, Léonard
Mesnard, Thomas
Shahriari, Bobak
Ramé, Alexandre
Ferret, Johan
Liu, Peter
Tafti, Pouya
Friesen, Abe
Casbon, Michelle
Ramos, Sabela
Kumar, Ravin
Lan, Charline Le
Jerome, Sammy
Tsitsulin, Anton
Vieillard, Nino
Stanczyk, Piotr
Girgin, Sertan
Momchev, Nikola
Hoffman, Matt
Thakoor, Shantanu
Grill, Jean-Bastien
Neyshabur, Behnam
Bachem, Olivier
Walton, Alanna
Severyn, Aliaksei
Parrish, Alicia
Ahmad, Aliya
Hutchison, Allen
Abdagic, Alvin
Carl, Amanda
Shen, Amy
Brock, Andy
Coenen, Andy
Laforge, Anthony
Paterson, Antonia
Bastian, Ben
Piot, Bilal
Wu, Bo
Royal, Brandon
Chen, Charlie
Kumar, Chintu
Perry, Chris
Welty, Chris
Choquette-Choo, Christopher A.
Sinopalnikov, Danila
Weinberger, David
Vijaykumar, Dimple
Rogozińska, Dominika
Herbison, Dustin
Bandy, Elisa
Wang, Emma
Noland, Eric
Moreira, Erica
Senter, Evan
Eltyshev, Evgenii
Visin, Francesco
Rasskin, Gabriel
Wei, Gary
Cameron, Glenn
Martins, Gus
Hashemi, Hadi
Klimczak-Plucińska, Hanna
Batra, Harleen
Dhand, Harsh
Nardini, Ivan
Mein, Jacinda
Zhou, Jack
Svensson, James
Stanway, Jeff
Chan, Jetha
Zhou, Jin Peng
Carrasqueira, Joana
Iljazi, Joana
Becker, Jocelyn
Fernandez, Joe
van Amersfoort, Joost
Gordon, Josh
Lipschultz, Josh
Newlan, Josh
Ji, Ju-yeong
Mohamed, Kareem
Badola, Kartikeya
Black, Kat
Millican, Katie
McDonell, Keelin
Nguyen, Kelvin
Sodhia, Kiranbir
Greene, Kish
Sjoesund, Lars Lowe
Usui, Lauren
Sifre, Laurent
Heuermann, Lena
Lago, Leticia
McNealus, Lilly
Soares, Livio Baldini
Kilpatrick, Logan
Dixon, Lucas
Martins, Luciano
Reid, Machel
Singh, Manvinder
Iverson, Mark
Görner, Martin
Velloso, Mat
Wirth, Mateo
Davidow, Matt
Miller, Matt
Rahtz, Matthew
Watson, Matthew
Risdal, Meg
Kazemi, Mehran
Moynihan, Michael
Zhang, Ming
Kahng, Minsuk
Park, Minwoo
Rahman, Mofi
Khatwani, Mohit
Dao, Natalie
Bardoliwalla, Nenshad
Devanathan, Nesh
Dumai, Neta
Chauhan, Nilay
Wahltinez, Oscar
Botarda, Pankil
Barnes, Parker
Barham, Paul
Michel, Paul
Jin, Pengchong
Georgiev, Petko
Culliton, Phil
Kuppala, Pradeep
Comanescu, Ramona
Merhej, Ramona
Jana, Reena
Rokni, Reza Ardeshir
Agarwal, Rishabh
Mullins, Ryan
Saadat, Samaneh
Carthy, Sara Mc
Cogan, Sarah
Perrin, Sarah
Arnold, Sébastien M. R.
Krause, Sebastian
Dai, Shengyang
Garg, Shruti
Sheth, Shruti
Ronstrom, Sue
Chan, Susan
Jordan, Timothy
Yu, Ting
Eccles, Tom
Hennigan, Tom
Kocisky, Tomas
Doshi, Tulsee
Jain, Vihan
Yadav, Vikas
Meshram, Vilobh
Dharmadhikari, Vishal
Barkley, Warren
Wei, Wei
Ye, Wenming
Han, Woohyun
Kwon, Woosuk
Xu, Xiang
Shen, Zhe
Gong, Zhitao
Wei, Zichuan
Cotruta, Victor
Kirk, Phoebe
Rao, Anand
Giang, Minh
Peran, Ludovic
Warkentin, Tris
Collins, Eli
Barral, Joelle
Ghahramani, Zoubin
Hadsell, Raia
Sculley, D.
Banks, Jeanine
Dragan, Anca
Petrov, Slav
Vinyals, Oriol
Dean, Jeff
Hassabis, Demis
Kavukcuoglu, Koray
Farabet, Clement
Buchatskaya, Elena
Borgeaud, Sebastian
Fiedel, Noah
Joulin, Armand
Kenealy, Kathleen
Dadashi, Robert
Andreev, Alek
Publication Year :
2024

Abstract

In this work, we introduce Gemma 2, a new addition to the Gemma family of lightweight, state-of-the-art open models, ranging in scale from 2 billion to 27 billion parameters. In this new version, we apply several known technical modifications to the Transformer architecture, such as interleaving local-global attentions (Beltagy et al., 2020a) and group-query attention (Ainslie et al., 2023). We also train the 2B and 9B models with knowledge distillation (Hinton et al., 2015) instead of next token prediction. The resulting models deliver the best performance for their size, and even offer competitive alternatives to models that are 2-3 times bigger. We release all our models to the community.

Details

Database :
arXiv
Publication Type :
Report
Accession number :
edsarx.2408.00118
Document Type :
Working Paper