157 results on '"Goyal, Naman"'
Search Results
2. The Llama 3 Herd of Models
- Author
-
Grattafiori, Aaron, Dubey, Abhimanyu, Jauhri, Abhinav, Pandey, Abhinav, Kadian, Abhishek, Al-Dahle, Ahmad, Letman, Aiesha, Mathur, Akhil, Schelten, Alan, Vaughan, Alex, Yang, Amy, Fan, Angela, Goyal, Anirudh, Hartshorn, Anthony, Yang, Aobo, Mitra, Archi, Sravankumar, Archie, Korenev, Artem, Hinsvark, Arthur, Rao, Arun, Zhang, Aston, Rodriguez, Aurelien, Gregerson, Austen, Spataru, Ava, Roziere, Baptiste, Biron, Bethany, Tang, Binh, Chern, Bobbie, Caucheteux, Charlotte, Nayak, Chaya, Bi, Chloe, Marra, Chris, McConnell, Chris, Keller, Christian, Touret, Christophe, Wu, Chunyang, Wong, Corinne, Ferrer, Cristian Canton, Nikolaidis, Cyrus, Allonsius, Damien, Song, Daniel, Pintz, Danielle, Livshits, Danny, Wyatt, Danny, Esiobu, David, Choudhary, Dhruv, Mahajan, Dhruv, Garcia-Olano, Diego, Perino, Diego, Hupkes, Dieuwke, Lakomkin, Egor, AlBadawy, Ehab, Lobanova, Elina, Dinan, Emily, Smith, Eric Michael, Radenovic, Filip, Guzmán, Francisco, Zhang, Frank, Synnaeve, Gabriel, Lee, Gabrielle, Anderson, Georgia Lewis, Thattai, Govind, Nail, Graeme, Mialon, Gregoire, Pang, Guan, Cucurell, Guillem, Nguyen, Hailey, Korevaar, Hannah, Xu, Hu, Touvron, Hugo, Zarov, Iliyan, Ibarra, Imanol Arrieta, Kloumann, Isabel, Misra, Ishan, Evtimov, Ivan, Zhang, Jack, Copet, Jade, Lee, Jaewon, Geffert, Jan, Vranes, Jana, Park, Jason, Mahadeokar, Jay, Shah, Jeet, van der Linde, Jelmer, Billock, Jennifer, Hong, Jenny, Lee, Jenya, Fu, Jeremy, Chi, Jianfeng, Huang, Jianyu, Liu, Jiawen, Wang, Jie, Yu, Jiecao, Bitton, Joanna, Spisak, Joe, Park, Jongsoo, Rocca, Joseph, Johnstun, Joshua, Saxe, Joshua, Jia, Junteng, Alwala, Kalyan Vasuden, Prasad, Karthik, Upasani, Kartikeya, Plawiak, Kate, Li, Ke, Heafield, Kenneth, Stone, Kevin, El-Arini, Khalid, Iyer, Krithika, Malik, Kshitiz, Chiu, Kuenley, Bhalla, Kunal, Lakhotia, Kushal, Rantala-Yeary, Lauren, van der Maaten, Laurens, Chen, Lawrence, Tan, Liang, Jenkins, Liz, Martin, Louis, Madaan, Lovish, Malo, Lubo, Blecher, Lukas, Landzaat, Lukas, de Oliveira, Luke, Muzzi, Madeline, Pasupuleti, Mahesh, Singh, Mannat, Paluri, Manohar, Kardas, Marcin, Tsimpoukelli, Maria, Oldham, Mathew, Rita, Mathieu, Pavlova, Maya, Kambadur, Melanie, Lewis, Mike, Si, Min, Singh, Mitesh Kumar, Hassan, Mona, Goyal, Naman, Torabi, Narjes, Bashlykov, Nikolay, Bogoychev, Nikolay, Chatterji, Niladri, Zhang, Ning, Duchenne, Olivier, Çelebi, Onur, Alrassy, Patrick, Zhang, Pengchuan, Li, Pengwei, Vasic, Petar, Weng, Peter, Bhargava, Prajjwal, Dubal, Pratik, Krishnan, Praveen, Koura, Punit Singh, Xu, Puxin, He, Qing, Dong, Qingxiao, Srinivasan, Ragavan, Ganapathy, Raj, Calderer, Ramon, Cabral, Ricardo Silveira, Stojnic, Robert, Raileanu, Roberta, Maheswari, Rohan, Girdhar, Rohit, Patel, Rohit, Sauvestre, Romain, Polidoro, Ronnie, Sumbaly, Roshan, Taylor, Ross, Silva, Ruan, Hou, Rui, Wang, Rui, Hosseini, Saghar, Chennabasappa, Sahana, Singh, Sanjay, Bell, Sean, Kim, Seohyun Sonia, Edunov, Sergey, Nie, Shaoliang, Narang, Sharan, Raparthy, Sharath, Shen, Sheng, Wan, Shengye, Bhosale, Shruti, Zhang, Shun, Vandenhende, Simon, Batra, Soumya, Whitman, Spencer, Sootla, Sten, Collot, Stephane, Gururangan, Suchin, Borodinsky, Sydney, Herman, Tamar, Fowler, Tara, Sheasha, Tarek, Georgiou, Thomas, Scialom, Thomas, Speckbacher, Tobias, Mihaylov, Todor, Xiao, Tong, Karn, Ujjwal, Goswami, Vedanuj, Gupta, Vibhor, Ramanathan, Vignesh, Kerkez, Viktor, Gonguet, Vincent, Do, Virginie, Vogeti, Vish, Albiero, Vítor, Petrovic, Vladan, Chu, Weiwei, Xiong, Wenhan, Fu, Wenyin, Meers, Whitney, Martinet, Xavier, Wang, Xiaodong, Wang, Xiaofang, Tan, Xiaoqing Ellen, Xia, Xide, Xie, Xinfeng, Jia, Xuchao, Wang, Xuewei, Goldschlag, Yaelle, Gaur, Yashesh, Babaei, Yasmine, Wen, Yi, Song, Yiwen, Zhang, Yuchen, Li, Yue, Mao, Yuning, Coudert, Zacharie Delpierre, Yan, Zheng, Chen, Zhengxing, Papakipos, Zoe, Singh, Aaditya, Srivastava, Aayushi, Jain, Abha, Kelsey, Adam, Shajnfeld, Adam, Gangidi, Adithya, Victoria, Adolfo, Goldstand, Ahuva, Menon, Ajay, Sharma, Ajay, Boesenberg, Alex, Baevski, Alexei, Feinstein, Allie, Kallet, Amanda, Sangani, Amit, Teo, Amos, Yunus, Anam, Lupu, Andrei, Alvarado, Andres, Caples, Andrew, Gu, Andrew, Ho, Andrew, Poulton, Andrew, Ryan, Andrew, Ramchandani, Ankit, Dong, Annie, Franco, Annie, Goyal, Anuj, Saraf, Aparajita, Chowdhury, Arkabandhu, Gabriel, Ashley, Bharambe, Ashwin, Eisenman, Assaf, Yazdan, Azadeh, James, Beau, Maurer, Ben, Leonhardi, Benjamin, Huang, Bernie, Loyd, Beth, De Paola, Beto, Paranjape, Bhargavi, Liu, Bing, Wu, Bo, Ni, Boyu, Hancock, Braden, Wasti, Bram, Spence, Brandon, Stojkovic, Brani, Gamido, Brian, Montalvo, Britt, Parker, Carl, Burton, Carly, Mejia, Catalina, Liu, Ce, Wang, Changhan, Kim, Changkyu, Zhou, Chao, Hu, Chester, Chu, Ching-Hsiang, Cai, Chris, Tindal, Chris, Feichtenhofer, Christoph, Gao, Cynthia, Civin, Damon, Beaty, Dana, Kreymer, Daniel, Li, Daniel, Adkins, David, Xu, David, Testuggine, Davide, David, Delia, Parikh, Devi, Liskovich, Diana, Foss, Didem, Wang, Dingkang, Le, Duc, Holland, Dustin, Dowling, Edward, Jamil, Eissa, Montgomery, Elaine, Presani, Eleonora, Hahn, Emily, Wood, Emily, Le, Eric-Tuan, Brinkman, Erik, Arcaute, Esteban, Dunbar, Evan, Smothers, Evan, Sun, Fei, Kreuk, Felix, Tian, Feng, Kokkinos, Filippos, Ozgenel, Firat, Caggioni, Francesco, Kanayet, Frank, Seide, Frank, Florez, Gabriela Medina, Schwarz, Gabriella, Badeer, Gada, Swee, Georgia, Halpern, Gil, Herman, Grant, Sizov, Grigory, Guangyi, Zhang, Lakshminarayanan, Guna, Inan, Hakan, Shojanazeri, Hamid, Zou, Han, Wang, Hannah, Zha, Hanwen, Habeeb, Haroun, Rudolph, Harrison, Suk, Helen, Aspegren, Henry, Goldman, Hunter, Zhan, Hongyuan, Damlaj, Ibrahim, Molybog, Igor, Tufanov, Igor, Leontiadis, Ilias, Veliche, Irina-Elena, Gat, Itai, Weissman, Jake, Geboski, James, Kohli, James, Lam, Janice, Asher, Japhet, Gaya, Jean-Baptiste, Marcus, Jeff, Tang, Jeff, Chan, Jennifer, Zhen, Jenny, Reizenstein, Jeremy, Teboul, Jeremy, Zhong, Jessica, Jin, Jian, Yang, Jingyi, Cummings, Joe, Carvill, Jon, Shepard, Jon, McPhie, Jonathan, Torres, Jonathan, Ginsburg, Josh, Wang, Junjie, Wu, Kai, U, Kam Hou, Saxena, Karan, Khandelwal, Kartikay, Zand, Katayoun, Matosich, Kathy, Veeraraghavan, Kaushik, Michelena, Kelly, Li, Keqian, Jagadeesh, Kiran, Huang, Kun, Chawla, Kunal, Huang, Kyle, Chen, Lailin, Garg, Lakshya, A, Lavender, Silva, Leandro, Bell, Lee, Zhang, Lei, Guo, Liangpeng, Yu, Licheng, Moshkovich, Liron, Wehrstedt, Luca, Khabsa, Madian, Avalani, Manav, Bhatt, Manish, Mankus, Martynas, Hasson, Matan, Lennie, Matthew, Reso, Matthias, Groshev, Maxim, Naumov, Maxim, Lathi, Maya, Keneally, Meghan, Liu, Miao, Seltzer, Michael L., Valko, Michal, Restrepo, Michelle, Patel, Mihir, Vyatskov, Mik, Samvelyan, Mikayel, Clark, Mike, Macey, Mike, Wang, Mike, Hermoso, Miquel Jubert, Metanat, Mo, Rastegari, Mohammad, Bansal, Munish, Santhanam, Nandhini, Parks, Natascha, White, Natasha, Bawa, Navyata, Singhal, Nayan, Egebo, Nick, Usunier, Nicolas, Mehta, Nikhil, Laptev, Nikolay Pavlovich, Dong, Ning, Cheng, Norman, Chernoguz, Oleg, Hart, Olivia, Salpekar, Omkar, Kalinli, Ozlem, Kent, Parkin, Parekh, Parth, Saab, Paul, Balaji, Pavan, Rittner, Pedro, Bontrager, Philip, Roux, Pierre, Dollar, Piotr, Zvyagina, Polina, Ratanchandani, Prashant, Yuvraj, Pritish, Liang, Qian, Alao, Rachad, Rodriguez, Rachel, Ayub, Rafi, Murthy, Raghotham, Nayani, Raghu, Mitra, Rahul, Parthasarathy, Rangaprabhu, Li, Raymond, Hogan, Rebekkah, Battey, Robin, Wang, Rocky, Howes, Russ, Rinott, Ruty, Mehta, Sachin, Siby, Sachin, Bondu, Sai Jayesh, Datta, Samyak, Chugh, Sara, Hunt, Sara, Dhillon, Sargun, Sidorov, Sasha, Pan, Satadru, Mahajan, Saurabh, Verma, Saurabh, Yamamoto, Seiji, Ramaswamy, Sharadh, Lindsay, Shaun, Feng, Sheng, Lin, Shenghao, Zha, Shengxin Cindy, Patil, Shishir, Shankar, Shiva, Zhang, Shuqiang, Wang, Sinong, Agarwal, Sneha, Sajuyigbe, Soji, Chintala, Soumith, Max, Stephanie, Chen, Stephen, Kehoe, Steve, Satterfield, Steve, Govindaprasad, Sudarshan, Gupta, Sumit, Deng, Summer, Cho, Sungmin, Virk, Sunny, Subramanian, Suraj, Choudhury, Sy, Goldman, Sydney, Remez, Tal, Glaser, Tamar, Best, Tamara, Koehler, Thilo, Robinson, Thomas, Li, Tianhe, Zhang, Tianjun, Matthews, Tim, Chou, Timothy, Shaked, Tzook, Vontimitta, Varun, Ajayi, Victoria, Montanez, Victoria, Mohan, Vijai, Kumar, Vinay Satish, Mangla, Vishal, Ionescu, Vlad, Poenaru, Vlad, Mihailescu, Vlad Tiberiu, Ivanov, Vladimir, Li, Wei, Wang, Wenchen, Jiang, Wenwen, Bouaziz, Wes, Constable, Will, Tang, Xiaocheng, Wu, Xiaojian, Wang, Xiaolan, Wu, Xilun, Gao, Xinbo, Kleinman, Yaniv, Chen, Yanjun, Hu, Ye, Jia, Ye, Qi, Ye, Li, Yenda, Zhang, Yilin, Zhang, Ying, Adi, Yossi, Nam, Youngjin, Yu, Wang, Zhao, Yu, Hao, Yuchen, Qian, Yundi, Li, Yunlu, He, Yuzi, Rait, Zach, DeVito, Zachary, Rosnbrick, Zef, Wen, Zhaoduo, Yang, Zhenyu, Zhao, Zhiwei, and Ma, Zhiyu
- Subjects
Computer Science - Artificial Intelligence ,Computer Science - Computation and Language ,Computer Science - Computer Vision and Pattern Recognition - Abstract
Modern artificial intelligence (AI) systems are powered by foundation models. This paper presents a new set of foundation models, called Llama 3. It is a herd of language models that natively support multilinguality, coding, reasoning, and tool usage. Our largest model is a dense Transformer with 405B parameters and a context window of up to 128K tokens. This paper presents an extensive empirical evaluation of Llama 3. We find that Llama 3 delivers comparable quality to leading language models such as GPT-4 on a plethora of tasks. We publicly release Llama 3, including pre-trained and post-trained versions of the 405B parameter language model and our Llama Guard 3 model for input and output safety. The paper also presents the results of experiments in which we integrate image, video, and speech capabilities into Llama 3 via a compositional approach. We observe this approach performs competitively with the state-of-the-art on image, video, and speech recognition tasks. The resulting models are not yet being broadly released as they are still under development.
- Published
- 2024
3. The Belebele Benchmark: a Parallel Reading Comprehension Dataset in 122 Language Variants
- Author
-
Bandarkar, Lucas, Liang, Davis, Muller, Benjamin, Artetxe, Mikel, Shukla, Satya Narayan, Husa, Donald, Goyal, Naman, Krishnan, Abhinandan, Zettlemoyer, Luke, and Khabsa, Madian
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence ,Computer Science - Machine Learning ,I.2.7 - Abstract
We present Belebele, a multiple-choice machine reading comprehension (MRC) dataset spanning 122 language variants. Significantly expanding the language coverage of natural language understanding (NLU) benchmarks, this dataset enables the evaluation of text models in high-, medium-, and low-resource languages. Each question is based on a short passage from the Flores-200 dataset and has four multiple-choice answers. The questions were carefully curated to discriminate between models with different levels of general language comprehension. The English dataset on its own proves difficult enough to challenge state-of-the-art language models. Being fully parallel, this dataset enables direct comparison of model performance across all languages. We use this dataset to evaluate the capabilities of multilingual masked language models (MLMs) and large language models (LLMs). We present extensive results and find that despite significant cross-lingual transfer in English-centric LLMs, much smaller MLMs pretrained on balanced multilingual data still understand far more languages. We also observe that larger vocabulary size and conscious vocabulary construction correlate with better performance on low-resource languages. Overall, Belebele opens up new avenues for evaluating and analyzing the multilingual capabilities of NLP systems., Comment: ACL 2024
- Published
- 2023
- Full Text
- View/download PDF
4. Llama 2: Open Foundation and Fine-Tuned Chat Models
- Author
-
Touvron, Hugo, Martin, Louis, Stone, Kevin, Albert, Peter, Almahairi, Amjad, Babaei, Yasmine, Bashlykov, Nikolay, Batra, Soumya, Bhargava, Prajjwal, Bhosale, Shruti, Bikel, Dan, Blecher, Lukas, Ferrer, Cristian Canton, Chen, Moya, Cucurull, Guillem, Esiobu, David, Fernandes, Jude, Fu, Jeremy, Fu, Wenyin, Fuller, Brian, Gao, Cynthia, Goswami, Vedanuj, Goyal, Naman, Hartshorn, Anthony, Hosseini, Saghar, Hou, Rui, Inan, Hakan, Kardas, Marcin, Kerkez, Viktor, Khabsa, Madian, Kloumann, Isabel, Korenev, Artem, Koura, Punit Singh, Lachaux, Marie-Anne, Lavril, Thibaut, Lee, Jenya, Liskovich, Diana, Lu, Yinghai, Mao, Yuning, Martinet, Xavier, Mihaylov, Todor, Mishra, Pushkar, Molybog, Igor, Nie, Yixin, Poulton, Andrew, Reizenstein, Jeremy, Rungta, Rashi, Saladi, Kalyan, Schelten, Alan, Silva, Ruan, Smith, Eric Michael, Subramanian, Ranjan, Tan, Xiaoqing Ellen, Tang, Binh, Taylor, Ross, Williams, Adina, Kuan, Jian Xiang, Xu, Puxin, Yan, Zheng, Zarov, Iliyan, Zhang, Yuchen, Fan, Angela, Kambadur, Melanie, Narang, Sharan, Rodriguez, Aurelien, Stojnic, Robert, Edunov, Sergey, and Scialom, Thomas
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence - Abstract
In this work, we develop and release Llama 2, a collection of pretrained and fine-tuned large language models (LLMs) ranging in scale from 7 billion to 70 billion parameters. Our fine-tuned LLMs, called Llama 2-Chat, are optimized for dialogue use cases. Our models outperform open-source chat models on most benchmarks we tested, and based on our human evaluations for helpfulness and safety, may be a suitable substitute for closed-source models. We provide a detailed description of our approach to fine-tuning and safety improvements of Llama 2-Chat in order to enable the community to build on our work and contribute to the responsible development of LLMs.
- Published
- 2023
5. A Theory on Adam Instability in Large-Scale Machine Learning
- Author
-
Molybog, Igor, Albert, Peter, Chen, Moya, DeVito, Zachary, Esiobu, David, Goyal, Naman, Koura, Punit Singh, Narang, Sharan, Poulton, Andrew, Silva, Ruan, Tang, Binh, Liskovich, Diana, Xu, Puxin, Zhang, Yuchen, Kambadur, Melanie, Roller, Stephen, and Zhang, Susan
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence ,Mathematics - Optimization and Control - Abstract
We present a theory for the previously unexplained divergent behavior noticed in the training of large language models. We argue that the phenomenon is an artifact of the dominant optimization algorithm used for training, called Adam. We observe that Adam can enter a state in which the parameter update vector has a relatively large norm and is essentially uncorrelated with the direction of descent on the training loss landscape, leading to divergence. This artifact is more likely to be observed in the training of a deep model with a large batch size, which is the typical setting of large-scale language model training. To argue the theory, we present observations from the training runs of the language models of different scales: 7 billion, 30 billion, 65 billion, and 546 billion parameters.
- Published
- 2023
6. LLaMA: Open and Efficient Foundation Language Models
- Author
-
Touvron, Hugo, Lavril, Thibaut, Izacard, Gautier, Martinet, Xavier, Lachaux, Marie-Anne, Lacroix, Timothée, Rozière, Baptiste, Goyal, Naman, Hambro, Eric, Azhar, Faisal, Rodriguez, Aurelien, Joulin, Armand, Grave, Edouard, and Lample, Guillaume
- Subjects
Computer Science - Computation and Language - Abstract
We introduce LLaMA, a collection of foundation language models ranging from 7B to 65B parameters. We train our models on trillions of tokens, and show that it is possible to train state-of-the-art models using publicly available datasets exclusively, without resorting to proprietary and inaccessible datasets. In particular, LLaMA-13B outperforms GPT-3 (175B) on most benchmarks, and LLaMA-65B is competitive with the best models, Chinchilla-70B and PaLM-540B. We release all our models to the research community.
- Published
- 2023
7. Text-To-4D Dynamic Scene Generation
- Author
-
Singer, Uriel, Sheynin, Shelly, Polyak, Adam, Ashual, Oron, Makarov, Iurii, Kokkinos, Filippos, Goyal, Naman, Vedaldi, Andrea, Parikh, Devi, Johnson, Justin, and Taigman, Yaniv
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence ,Computer Science - Machine Learning - Abstract
We present MAV3D (Make-A-Video3D), a method for generating three-dimensional dynamic scenes from text descriptions. Our approach uses a 4D dynamic Neural Radiance Field (NeRF), which is optimized for scene appearance, density, and motion consistency by querying a Text-to-Video (T2V) diffusion-based model. The dynamic video output generated from the provided text can be viewed from any camera location and angle, and can be composited into any 3D environment. MAV3D does not require any 3D or 4D data and the T2V model is trained only on Text-Image pairs and unlabeled videos. We demonstrate the effectiveness of our approach using comprehensive quantitative and qualitative experiments and show an improvement over previously established internal baselines. To the best of our knowledge, our method is the first to generate 3D dynamic scenes given a text description.
- Published
- 2023
8. XLM-V: Overcoming the Vocabulary Bottleneck in Multilingual Masked Language Models
- Author
-
Liang, Davis, Gonen, Hila, Mao, Yuning, Hou, Rui, Goyal, Naman, Ghazvininejad, Marjan, Zettlemoyer, Luke, and Khabsa, Madian
- Subjects
Computer Science - Computation and Language ,Computer Science - Machine Learning - Abstract
Large multilingual language models typically rely on a single vocabulary shared across 100+ languages. As these models have increased in parameter count and depth, vocabulary size has remained largely unchanged. This \textit{vocabulary bottleneck} limits the representational capabilities of multilingual models like XLM-R. In this paper, we introduce a new approach for scaling to very large multilingual vocabularies by de-emphasizing token sharing between languages with little lexical overlap and assigning vocabulary capacity to achieve sufficient coverage for each individual language. Tokenizations using our vocabulary are typically more semantically meaningful and shorter compared to XLM-R. Leveraging this improved vocabulary, we train XLM-V, a multilingual language model with a one million token vocabulary. XLM-V outperforms XLM-R on every task we tested on ranging from natural language inference (XNLI), question answering (MLQA, XQuAD, TyDiQA), to named entity recognition (WikiAnn). XLM-V is particularly effective on low-resource language tasks and outperforms XLM-R by 11.2% and 5.8% absolute on MasakhaNER and Americas NLI, respectively., Comment: EMNLP 2023
- Published
- 2023
9. Scaling Laws for Generative Mixed-Modal Language Models
- Author
-
Aghajanyan, Armen, Yu, Lili, Conneau, Alexis, Hsu, Wei-Ning, Hambardzumyan, Karen, Zhang, Susan, Roller, Stephen, Goyal, Naman, Levy, Omer, and Zettlemoyer, Luke
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence ,Computer Science - Machine Learning - Abstract
Generative language models define distributions over sequences of tokens that can represent essentially any combination of data modalities (e.g., any permutation of image tokens from VQ-VAEs, speech tokens from HuBERT, BPE tokens for language or code, and so on). To better understand the scaling properties of such mixed-modal models, we conducted over 250 experiments using seven different modalities and model sizes ranging from 8 million to 30 billion, trained on 5-100 billion tokens. We report new mixed-modal scaling laws that unify the contributions of individual modalities and the interactions between them. Specifically, we explicitly model the optimal synergy and competition due to data and model size as an additive term to previous uni-modal scaling laws. We also find four empirical phenomena observed during the training, such as emergent coordinate-ascent style training that naturally alternates between modalities, guidelines for selecting critical hyper-parameters, and connections between mixed-modal competition and training stability. Finally, we test our scaling law by training a 30B speech-text model, which significantly outperforms the corresponding unimodal models. Overall, our research provides valuable insights into the design and training of mixed-modal generative models, an important new class of unified models that have unique distributional properties.
- Published
- 2023
10. A survey on Self Supervised learning approaches for improving Multimodal representation learning
- Author
-
Goyal, Naman
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence ,Computer Science - Computer Vision and Pattern Recognition - Abstract
Recently self supervised learning has seen explosive growth and use in variety of machine learning tasks because of its ability to avoid the cost of annotating large-scale datasets. This paper gives an overview for best self supervised learning approaches for multimodal learning. The presented approaches have been aggregated by extensive study of the literature and tackle the application of self supervised learning in different ways. The approaches discussed are cross modal generation, cross modal pretraining, cyclic translation, and generating unimodal labels in self supervised fashion.
- Published
- 2022
11. BlenderBot 3: a deployed conversational agent that continually learns to responsibly engage
- Author
-
Shuster, Kurt, Xu, Jing, Komeili, Mojtaba, Ju, Da, Smith, Eric Michael, Roller, Stephen, Ung, Megan, Chen, Moya, Arora, Kushal, Lane, Joshua, Behrooz, Morteza, Ngan, William, Poff, Spencer, Goyal, Naman, Szlam, Arthur, Boureau, Y-Lan, Kambadur, Melanie, and Weston, Jason
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence - Abstract
We present BlenderBot 3, a 175B parameter dialogue model capable of open-domain conversation with access to the internet and a long-term memory, and having been trained on a large number of user defined tasks. We release both the model weights and code, and have also deployed the model on a public web page to interact with organic users. This technical report describes how the model was built (architecture, model and training scheme), and details of its deployment, including safety mechanisms. Human evaluations show its superiority to existing open-domain dialogue agents, including its predecessors (Roller et al., 2021; Komeili et al., 2022). Finally, we detail our plan for continual learning using the data collected from deployment, which will also be publicly released. The goal of this research program is thus to enable the community to study ever-improving responsible agents that learn through interaction.
- Published
- 2022
12. On the Role of Bidirectionality in Language Model Pre-Training
- Author
-
Artetxe, Mikel, Du, Jingfei, Goyal, Naman, Zettlemoyer, Luke, and Stoyanov, Ves
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence ,Computer Science - Machine Learning - Abstract
Prior work on language model pre-training has explored different architectures and learning objectives, but differences in data, hyperparameters and evaluation make a principled comparison difficult. In this work, we focus on bidirectionality as a key factor that differentiates existing approaches, and present a comprehensive study of its role in next token prediction, text infilling, zero-shot priming and fine-tuning. We propose a new framework that generalizes prior approaches, including fully unidirectional models like GPT, fully bidirectional models like BERT, and hybrid models like CM3 and prefix LM. Our framework distinguishes between two notions of bidirectionality (bidirectional context and bidirectional attention) and allows us to control each of them separately. We find that the optimal configuration is largely application-dependent (e.g., bidirectional attention is beneficial for fine-tuning and infilling, but harmful for next token prediction and zero-shot priming). We train models with up to 6.7B parameters, and find differences to remain consistent at scale. While prior work on scaling has focused on left-to-right autoregressive models, our results suggest that this approach comes with some trade-offs, and it might be worthwhile to develop very large bidirectional models., Comment: Findings of EMNLP 2022
- Published
- 2022
13. Lifting the Curse of Multilinguality by Pre-training Modular Transformers
- Author
-
Pfeiffer, Jonas, Goyal, Naman, Lin, Xi Victoria, Li, Xian, Cross, James, Riedel, Sebastian, and Artetxe, Mikel
- Subjects
Computer Science - Computation and Language - Abstract
Multilingual pre-trained models are known to suffer from the curse of multilinguality, which causes per-language performance to drop as they cover more languages. We address this issue by introducing language-specific modules, which allows us to grow the total capacity of the model, while keeping the total number of trainable parameters per language constant. In contrast with prior work that learns language-specific components post-hoc, we pre-train the modules of our Cross-lingual Modular (X-Mod) models from the start. Our experiments on natural language inference, named entity recognition and question answering show that our approach not only mitigates the negative interference between languages, but also enables positive transfer, resulting in improved monolingual and cross-lingual performance. Furthermore, our approach enables adding languages post-hoc with no measurable drop in performance, no longer limiting the model usage to the set of pre-trained languages., Comment: NAACL 2022
- Published
- 2022
14. OPT: Open Pre-trained Transformer Language Models
- Author
-
Zhang, Susan, Roller, Stephen, Goyal, Naman, Artetxe, Mikel, Chen, Moya, Chen, Shuohui, Dewan, Christopher, Diab, Mona, Li, Xian, Lin, Xi Victoria, Mihaylov, Todor, Ott, Myle, Shleifer, Sam, Shuster, Kurt, Simig, Daniel, Koura, Punit Singh, Sridhar, Anjali, Wang, Tianlu, and Zettlemoyer, Luke
- Subjects
Computer Science - Computation and Language ,Computer Science - Machine Learning - Abstract
Large language models, which are often trained for hundreds of thousands of compute days, have shown remarkable capabilities for zero- and few-shot learning. Given their computational cost, these models are difficult to replicate without significant capital. For the few that are available through APIs, no access is granted to the full model weights, making them difficult to study. We present Open Pre-trained Transformers (OPT), a suite of decoder-only pre-trained transformers ranging from 125M to 175B parameters, which we aim to fully and responsibly share with interested researchers. We show that OPT-175B is comparable to GPT-3, while requiring only 1/7th the carbon footprint to develop. We are also releasing our logbook detailing the infrastructure challenges we faced, along with code for experimenting with all of the released models.
- Published
- 2022
15. How Robust is Neural Machine Translation to Language Imbalance in Multilingual Tokenizer Training?
- Author
-
Zhang, Shiyue, Chaudhary, Vishrav, Goyal, Naman, Cross, James, Wenzek, Guillaume, Bansal, Mohit, and Guzman, Francisco
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence - Abstract
A multilingual tokenizer is a fundamental component of multilingual neural machine translation. It is trained from a multilingual corpus. Since a skewed data distribution is considered to be harmful, a sampling strategy is usually used to balance languages in the corpus. However, few works have systematically answered how language imbalance in tokenizer training affects downstream performance. In this work, we analyze how translation performance changes as the data ratios among languages vary in the tokenizer training corpus. We find that while relatively better performance is often observed when languages are more equally sampled, the downstream performance is more robust to language imbalance than we usually expected. Two features, UNK rate and closeness to the character level, can warn of poor downstream performance before performing the task. We also distinguish language sampling for tokenizer training from sampling for model training and show that the model is more sensitive to the latter., Comment: AMTA 2022
- Published
- 2022
16. Graph Neural Networks for Image Classification and Reinforcement Learning using Graph representations
- Author
-
Goyal, Naman and Steiner, David
- Subjects
Computer Science - Machine Learning ,Computer Science - Computer Vision and Pattern Recognition - Abstract
In this paper, we will evaluate the performance of graph neural networks in two distinct domains: computer vision and reinforcement learning. In the computer vision section, we seek to learn whether a novel non-redundant representation for images as graphs can improve performance over trivial pixel to node mapping on a graph-level prediction graph, specifically image classification. For the reinforcement learning section, we seek to learn if explicitly modeling solving a Rubik's cube as a graph problem can improve performance over a standard model-free technique with no inductive bias., Comment: The work was done as a project for Neural Networks and Deep Learning course, Fall 2021 offering by Prof. Richard Zemel at Columbia University
- Published
- 2022
17. CM3: A Causal Masked Multimodal Model of the Internet
- Author
-
Aghajanyan, Armen, Huang, Bernie, Ross, Candace, Karpukhin, Vladimir, Xu, Hu, Goyal, Naman, Okhonko, Dmytro, Joshi, Mandar, Ghosh, Gargi, Lewis, Mike, and Zettlemoyer, Luke
- Subjects
Computer Science - Computation and Language - Abstract
We introduce CM3, a family of causally masked generative models trained over a large corpus of structured multi-modal documents that can contain both text and image tokens. Our new causally masked approach generates tokens left to right while also masking out a small number of long token spans that are generated at the end of the string, instead of their original positions. The casual masking object provides a type of hybrid of the more common causal and masked language models, by enabling full generative modeling while also providing bidirectional context when generating the masked spans. We train causally masked language-image models on large-scale web and Wikipedia articles, where each document contains all of the text, hypertext markup, hyperlinks, and image tokens (from a VQVAE-GAN), provided in the order they appear in the original HTML source (before masking). The resulting CM3 models can generate rich structured, multi-modal outputs while conditioning on arbitrary masked document contexts, and thereby implicitly learn a wide range of text, image, and cross modal tasks. They can be prompted to recover, in a zero-shot fashion, the functionality of models such as DALL-E, GENRE, and HTLM. We set the new state-of-the-art in zero-shot summarization, entity linking, and entity disambiguation while maintaining competitive performance in the fine-tuning setting. We can generate images unconditionally, conditioned on text (like DALL-E) and do captioning all in a zero-shot setting with a single model.
- Published
- 2022
18. Efficient Large Scale Language Modeling with Mixtures of Experts
- Author
-
Artetxe, Mikel, Bhosale, Shruti, Goyal, Naman, Mihaylov, Todor, Ott, Myle, Shleifer, Sam, Lin, Xi Victoria, Du, Jingfei, Iyer, Srinivasan, Pasunuru, Ramakanth, Anantharaman, Giri, Li, Xian, Chen, Shuohui, Akin, Halil, Baines, Mandeep, Martin, Louis, Zhou, Xing, Koura, Punit Singh, O'Horo, Brian, Wang, Jeff, Zettlemoyer, Luke, Diab, Mona, Kozareva, Zornitsa, and Stoyanov, Ves
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence ,Computer Science - Machine Learning - Abstract
Mixture of Experts layers (MoEs) enable efficient scaling of language models through conditional computation. This paper presents a detailed empirical study of how autoregressive MoE language models scale in comparison with dense models in a wide range of settings: in- and out-of-domain language modeling, zero- and few-shot priming, and full-shot fine-tuning. With the exception of fine-tuning, we find MoEs to be substantially more compute efficient. At more modest training budgets, MoEs can match the performance of dense models using $\sim$4 times less compute. This gap narrows at scale, but our largest MoE model (1.1T parameters) consistently outperforms a compute-equivalent dense model (6.7B parameters). Overall, this performance gap varies greatly across tasks and domains, suggesting that MoE and dense models generalize differently in ways that are worthy of future study. We make our code and models publicly available for research use., Comment: EMNLP 2022
- Published
- 2021
19. Few-shot Learning with Multilingual Language Models
- Author
-
Lin, Xi Victoria, Mihaylov, Todor, Artetxe, Mikel, Wang, Tianlu, Chen, Shuohui, Simig, Daniel, Ott, Myle, Goyal, Naman, Bhosale, Shruti, Du, Jingfei, Pasunuru, Ramakanth, Shleifer, Sam, Koura, Punit Singh, Chaudhary, Vishrav, O'Horo, Brian, Wang, Jeff, Zettlemoyer, Luke, Kozareva, Zornitsa, Diab, Mona, Stoyanov, Veselin, and Li, Xian
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence - Abstract
Large-scale generative language models such as GPT-3 are competitive few-shot learners. While these models are known to be able to jointly represent many different languages, their training data is dominated by English, potentially limiting their cross-lingual generalization. In this work, we train multilingual generative language models on a corpus covering a diverse set of languages, and study their few- and zero-shot learning capabilities in a wide range of tasks. Our largest model with 7.5 billion parameters sets new state of the art in few-shot learning in more than 20 representative languages, outperforming GPT-3 of comparable size in multilingual commonsense reasoning (with +7.4% absolute accuracy improvement in 0-shot settings and +9.4% in 4-shot settings) and natural language inference (+5.4% in each of 0-shot and 4-shot settings). On the FLORES-101 machine translation benchmark, our model outperforms GPT-3 on 171 out of 182 directions with 32 training examples, while surpassing the official supervised baseline in 45 directions. We conduct an in-depth analysis of different multilingual prompting approaches, showing in particular that strong few-shot learning performance across languages can be achieved via cross-lingual transfer through both templates and demonstration examples. Finally, we evaluate our models in social value tasks such as hate speech detection in five languages and find it has limitations similar to comparable sized GPT-3 models., Comment: Accepted to EMNLP 2022; 34 pages
- Published
- 2021
20. XLS-R: Self-supervised Cross-lingual Speech Representation Learning at Scale
- Author
-
Babu, Arun, Wang, Changhan, Tjandra, Andros, Lakhotia, Kushal, Xu, Qiantong, Goyal, Naman, Singh, Kritika, von Platen, Patrick, Saraf, Yatharth, Pino, Juan, Baevski, Alexei, Conneau, Alexis, and Auli, Michael
- Subjects
Computer Science - Computation and Language ,Computer Science - Sound ,Electrical Engineering and Systems Science - Audio and Speech Processing - Abstract
This paper presents XLS-R, a large-scale model for cross-lingual speech representation learning based on wav2vec 2.0. We train models with up to 2B parameters on nearly half a million hours of publicly available speech audio in 128 languages, an order of magnitude more public data than the largest known prior work. Our evaluation covers a wide range of tasks, domains, data regimes and languages, both high and low-resource. On the CoVoST-2 speech translation benchmark, we improve the previous state of the art by an average of 7.4 BLEU over 21 translation directions into English. For speech recognition, XLS-R improves over the best known prior work on BABEL, MLS, CommonVoice as well as VoxPopuli, lowering error rates by 14-34% relative on average. XLS-R also sets a new state of the art on VoxLingua107 language identification. Moreover, we show that with sufficient model size, cross-lingual pretraining can outperform English-only pretraining when translating English speech into other languages, a setting which favors monolingual pretraining. We hope XLS-R can help to improve speech processing tasks for many more languages of the world.
- Published
- 2021
21. FST: the FAIR Speech Translation System for the IWSLT21 Multilingual Shared Task
- Author
-
Tang, Yun, Gong, Hongyu, Li, Xian, Wang, Changhan, Pino, Juan, Schwenk, Holger, and Goyal, Naman
- Subjects
Computer Science - Computation and Language ,Computer Science - Sound ,Electrical Engineering and Systems Science - Audio and Speech Processing - Abstract
In this paper, we describe our end-to-end multilingual speech translation system submitted to the IWSLT 2021 evaluation campaign on the Multilingual Speech Translation shared task. Our system is built by leveraging transfer learning across modalities, tasks and languages. First, we leverage general-purpose multilingual modules pretrained with large amounts of unlabelled and labelled data. We further enable knowledge transfer from the text task to the speech task by training two tasks jointly. Finally, our multilingual model is finetuned on speech translation task-specific data to achieve the best translation results. Experimental results show our system outperforms the reported systems, including both end-to-end and cascaded based approaches, by a large margin. In some translation directions, our speech translation results evaluated on the public Multilingual TEDx test set are even comparable with the ones from a strong text-to-text translation system, which uses the oracle speech transcripts as input., Comment: Accepted by IWSLT 2021 as a system paper
- Published
- 2021
22. The FLORES-101 Evaluation Benchmark for Low-Resource and Multilingual Machine Translation
- Author
-
Goyal, Naman, Gao, Cynthia, Chaudhary, Vishrav, Chen, Peng-Jen, Wenzek, Guillaume, Ju, Da, Krishnan, Sanjana, Ranzato, Marc'Aurelio, Guzman, Francisco, and Fan, Angela
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence - Abstract
One of the biggest challenges hindering progress in low-resource and multilingual machine translation is the lack of good evaluation benchmarks. Current evaluation benchmarks either lack good coverage of low-resource languages, consider only restricted domains, or are low quality because they are constructed using semi-automatic procedures. In this work, we introduce the FLORES-101 evaluation benchmark, consisting of 3001 sentences extracted from English Wikipedia and covering a variety of different topics and domains. These sentences have been translated in 101 languages by professional translators through a carefully controlled process. The resulting dataset enables better assessment of model quality on the long tail of low-resource languages, including the evaluation of many-to-many multilingual translation systems, as all translations are multilingually aligned. By publicly releasing such a high-quality and high-coverage dataset, we hope to foster progress in the machine translation community and beyond.
- Published
- 2021
23. Adapting High-resource NMT Models to Translate Low-resource Related Languages without Parallel Data
- Author
-
Ko, Wei-Jen, El-Kishky, Ahmed, Renduchintala, Adithya, Chaudhary, Vishrav, Goyal, Naman, Guzmán, Francisco, Fung, Pascale, Koehn, Philipp, and Diab, Mona
- Subjects
Computer Science - Computation and Language - Abstract
The scarcity of parallel data is a major obstacle for training high-quality machine translation systems for low-resource languages. Fortunately, some low-resource languages are linguistically related or similar to high-resource languages; these related languages may share many lexical or syntactic structures. In this work, we exploit this linguistic overlap to facilitate translating to and from a low-resource language with only monolingual data, in addition to any parallel data in the related high-resource language. Our method, NMT-Adapt, combines denoising autoencoding, back-translation and adversarial objectives to utilize monolingual data for low-resource adaptation. We experiment on 7 languages from three different language families and show that our technique significantly improves translation into low-resource language compared to other translation baselines., Comment: ACL 2021
- Published
- 2021
24. Larger-Scale Transformers for Multilingual Masked Language Modeling
- Author
-
Goyal, Naman, Du, Jingfei, Ott, Myle, Anantharaman, Giri, and Conneau, Alexis
- Subjects
Computer Science - Computation and Language - Abstract
Recent work has demonstrated the effectiveness of cross-lingual language model pretraining for cross-lingual understanding. In this study, we present the results of two larger multilingual masked language models, with 3.5B and 10.7B parameters. Our two new models dubbed XLM-R XL and XLM-R XXL outperform XLM-R by 1.8% and 2.4% average accuracy on XNLI. Our model also outperforms the RoBERTa-Large model on several English tasks of the GLUE benchmark by 0.3% on average while handling 99 more languages. This suggests pretrained models with larger capacity may obtain both strong performance on high-resource languages while greatly improving low-resource languages. We make our code and models publicly available., Comment: 4 pages
- Published
- 2021
25. BASE Layers: Simplifying Training of Large, Sparse Models
- Author
-
Lewis, Mike, Bhosale, Shruti, Dettmers, Tim, Goyal, Naman, and Zettlemoyer, Luke
- Subjects
Computer Science - Computation and Language - Abstract
We introduce a new balanced assignment of experts (BASE) layer for large language models that greatly simplifies existing high capacity sparse layers. Sparse layers can dramatically improve the efficiency of training and inference by routing each token to specialized expert modules that contain only a small fraction of the model parameters. However, it can be difficult to learn balanced routing functions that make full use of the available experts; existing approaches typically use routing heuristics or auxiliary expert-balancing loss functions. In contrast, we formulate token-to-expert allocation as a linear assignment problem, allowing an optimal assignment in which each expert receives an equal number of tokens. This optimal assignment scheme improves efficiency by guaranteeing balanced compute loads, and also simplifies training by not requiring any new hyperparameters or auxiliary losses. Code is publicly released at https://github.com/pytorch/fairseq/
- Published
- 2021
26. Multilingual Autoregressive Entity Linking
- Author
-
De Cao, Nicola, Wu, Ledell, Popat, Kashyap, Artetxe, Mikel, Goyal, Naman, Plekhanov, Mikhail, Zettlemoyer, Luke, Cancedda, Nicola, Riedel, Sebastian, and Petroni, Fabio
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence ,Statistics - Machine Learning - Abstract
We present mGENRE, a sequence-to-sequence system for the Multilingual Entity Linking (MEL) problem -- the task of resolving language-specific mentions to a multilingual Knowledge Base (KB). For a mention in a given language, mGENRE predicts the name of the target entity left-to-right, token-by-token in an autoregressive fashion. The autoregressive formulation allows us to effectively cross-encode mention string and entity names to capture more interactions than the standard dot product between mention and entity vectors. It also enables fast search within a large KB even for mentions that do not appear in mention tables and with no need for large-scale vector indices. While prior MEL works use a single representation for each entity, we match against entity names of as many languages as possible, which allows exploiting language connections between source input and target name. Moreover, in a zero-shot setting on languages with no training data at all, mGENRE treats the target language as a latent variable that is marginalized at prediction time. This leads to over 50% improvements in average accuracy. We show the efficacy of our approach through extensive evaluation including experiments on three popular MEL benchmarks where mGENRE establishes new state-of-the-art results. Code and pre-trained models at https://github.com/facebookresearch/GENRE., Comment: 20 pages, 8 figures, and 11 tables
- Published
- 2021
27. Facebook AI's WMT20 News Translation Task Submission
- Author
-
Chen, Peng-Jen, Lee, Ann, Wang, Changhan, Goyal, Naman, Fan, Angela, Williamson, Mary, and Gu, Jiatao
- Subjects
Computer Science - Computation and Language - Abstract
This paper describes Facebook AI's submission to WMT20 shared news translation task. We focus on the low resource setting and participate in two language pairs, Tamil <-> English and Inuktitut <-> English, where there are limited out-of-domain bitext and monolingual data. We approach the low resource problem using two main strategies, leveraging all available data and adapting the system to the target news domain. We explore techniques that leverage bitext and monolingual data from all languages, such as self-supervised model pretraining, multilingual models, data augmentation, and reranking. To better adapt the translation system to the test domain, we explore dataset tagging and fine-tuning on in-domain data. We observe that different techniques provide varied improvements based on the available data of the language pair. Based on the finding, we integrate these techniques into one training pipeline. For En->Ta, we explore an unconstrained setup with additional Tamil bitext and monolingual data and show that further improvement can be obtained. On the test set, our best submitted systems achieve 21.5 and 13.7 BLEU for Ta->En and En->Ta respectively, and 27.9 and 13.0 for Iu->En and En->Iu respectively.
- Published
- 2020
28. Beyond English-Centric Multilingual Machine Translation
- Author
-
Fan, Angela, Bhosale, Shruti, Schwenk, Holger, Ma, Zhiyi, El-Kishky, Ahmed, Goyal, Siddharth, Baines, Mandeep, Celebi, Onur, Wenzek, Guillaume, Chaudhary, Vishrav, Goyal, Naman, Birch, Tom, Liptchinsky, Vitaliy, Edunov, Sergey, Grave, Edouard, Auli, Michael, and Joulin, Armand
- Subjects
Computer Science - Computation and Language ,Computer Science - Machine Learning - Abstract
Existing work in translation demonstrated the potential of massively multilingual machine translation by training a single model able to translate between any pair of languages. However, much of this work is English-Centric by training only on data which was translated from or to English. While this is supported by large sources of training data, it does not reflect translation needs worldwide. In this work, we create a true Many-to-Many multilingual translation model that can translate directly between any pair of 100 languages. We build and open source a training dataset that covers thousands of language directions with supervised data, created through large-scale mining. Then, we explore how to effectively increase model capacity through a combination of dense scaling and language-specific sparse parameters to create high quality models. Our focus on non-English-Centric models brings gains of more than 10 BLEU when directly translating between non-English directions while performing competitively to the best single systems of WMT. We open-source our scripts so that others may reproduce the data, evaluation, and final M2M-100 model.
- Published
- 2020
29. Better Fine-Tuning by Reducing Representational Collapse
- Author
-
Aghajanyan, Armen, Shrivastava, Akshat, Gupta, Anchit, Goyal, Naman, Zettlemoyer, Luke, and Gupta, Sonal
- Subjects
Computer Science - Machine Learning ,Computer Science - Computation and Language ,Statistics - Machine Learning - Abstract
Although widely adopted, existing approaches for fine-tuning pre-trained language models have been shown to be unstable across hyper-parameter settings, motivating recent work on trust region methods. In this paper, we present a simplified and efficient method rooted in trust region theory that replaces previously used adversarial objectives with parametric noise (sampling from either a normal or uniform distribution), thereby discouraging representation change during fine-tuning when possible without hurting performance. We also introduce a new analysis to motivate the use of trust region methods more generally, by studying representational collapse; the degradation of generalizable representations from pre-trained models as they are fine-tuned for a specific end task. Extensive experiments show that our fine-tuning method matches or exceeds the performance of previous trust region methods on a range of understanding and generation tasks (including DailyMail/CNN, Gigaword, Reddit TIFU, and the GLUE benchmark), while also being much faster. We also show that it is less prone to representation collapse; the pre-trained models maintain more generalizable representations every time they are fine-tuned.
- Published
- 2020
30. Multilingual Translation with Extensible Multilingual Pretraining and Finetuning
- Author
-
Tang, Yuqing, Tran, Chau, Li, Xian, Chen, Peng-Jen, Goyal, Naman, Chaudhary, Vishrav, Gu, Jiatao, and Fan, Angela
- Subjects
Computer Science - Computation and Language - Abstract
Recent work demonstrates the potential of multilingual pretraining of creating one model that can be used for various tasks in different languages. Previous work in multilingual pretraining has demonstrated that machine translation systems can be created by finetuning on bitext. In this work, we show that multilingual translation models can be created through multilingual finetuning. Instead of finetuning on one direction, a pretrained model is finetuned on many directions at the same time. Compared to multilingual models trained from scratch, starting from pretrained models incorporates the benefits of large quantities of unlabeled monolingual data, which is particularly important for low resource languages where bitext is not available. We demonstrate that pretrained models can be extended to incorporate additional languages without loss of performance. We double the number of languages in mBART to support multilingual machine translation models of 50 languages. Finally, we create the ML50 benchmark, covering low, mid, and high resource languages, to facilitate reproducible research by standardizing training and evaluation data. On ML50, we demonstrate that multilingual finetuning improves on average 1 BLEU over the strongest baselines (being either multilingual from scratch or bilingual finetuning) while improving 9.3 BLEU on average over bilingual baselines from scratch., Comment: 10 pages (main) + 5 pages (appendices). 9 tables and 2 figures
- Published
- 2020
31. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
- Author
-
Lewis, Patrick, Perez, Ethan, Piktus, Aleksandra, Petroni, Fabio, Karpukhin, Vladimir, Goyal, Naman, Küttler, Heinrich, Lewis, Mike, Yih, Wen-tau, Rocktäschel, Tim, Riedel, Sebastian, and Kiela, Douwe
- Subjects
Computer Science - Computation and Language ,Computer Science - Machine Learning - Abstract
Large pre-trained language models have been shown to store factual knowledge in their parameters, and achieve state-of-the-art results when fine-tuned on downstream NLP tasks. However, their ability to access and precisely manipulate knowledge is still limited, and hence on knowledge-intensive tasks, their performance lags behind task-specific architectures. Additionally, providing provenance for their decisions and updating their world knowledge remain open research problems. Pre-trained models with a differentiable access mechanism to explicit non-parametric memory can overcome this issue, but have so far been only investigated for extractive downstream tasks. We explore a general-purpose fine-tuning recipe for retrieval-augmented generation (RAG) -- models which combine pre-trained parametric and non-parametric memory for language generation. We introduce RAG models where the parametric memory is a pre-trained seq2seq model and the non-parametric memory is a dense vector index of Wikipedia, accessed with a pre-trained neural retriever. We compare two RAG formulations, one which conditions on the same retrieved passages across the whole generated sequence, the other can use different passages per token. We fine-tune and evaluate our models on a wide range of knowledge-intensive NLP tasks and set the state-of-the-art on three open domain QA tasks, outperforming parametric seq2seq models and task-specific retrieve-and-extract architectures. For language generation tasks, we find that RAG models generate more specific, diverse and factual language than a state-of-the-art parametric-only seq2seq baseline., Comment: Accepted at NeurIPS 2020
- Published
- 2020
32. Recipes for building an open-domain chatbot
- Author
-
Roller, Stephen, Dinan, Emily, Goyal, Naman, Ju, Da, Williamson, Mary, Liu, Yinhan, Xu, Jing, Ott, Myle, Shuster, Kurt, Smith, Eric M., Boureau, Y-Lan, and Weston, Jason
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence - Abstract
Building open-domain chatbots is a challenging area for machine learning research. While prior work has shown that scaling neural models in the number of parameters and the size of the data they are trained on gives improved results, we show that other ingredients are important for a high-performing chatbot. Good conversation requires a number of skills that an expert conversationalist blends in a seamless way: providing engaging talking points and listening to their partners, and displaying knowledge, empathy and personality appropriately, while maintaining a consistent persona. We show that large scale models can learn these skills when given appropriate training data and choice of generation strategy. We build variants of these recipes with 90M, 2.7B and 9.4B parameter models, and make our models and code publicly available. Human evaluations show our best models are superior to existing approaches in multi-turn dialogue in terms of engagingness and humanness measurements. We then discuss the limitations of this work by analyzing failure cases of our models.
- Published
- 2020
33. An Axiomatic Analysis for Object Detection and Recognition Using Deep Learning
- Author
-
Goyal, Naman, Goraya, Major Singh, Singh, Tajinder, Howlett, Robert J., Series Editor, Jain, Lakhmi C., Series Editor, Bhattacharyya, Siddhartha, editor, Banerjee, Jyoti Sekhar, editor, and Köppen, Mario, editor
- Published
- 2023
- Full Text
- View/download PDF
34. Multilingual Denoising Pre-training for Neural Machine Translation
- Author
-
Liu, Yinhan, Gu, Jiatao, Goyal, Naman, Li, Xian, Edunov, Sergey, Ghazvininejad, Marjan, Lewis, Mike, and Zettlemoyer, Luke
- Subjects
Computer Science - Computation and Language - Abstract
This paper demonstrates that multilingual denoising pre-training produces significant performance gains across a wide variety of machine translation (MT) tasks. We present mBART -- a sequence-to-sequence denoising auto-encoder pre-trained on large-scale monolingual corpora in many languages using the BART objective. mBART is one of the first methods for pre-training a complete sequence-to-sequence model by denoising full texts in multiple languages, while previous approaches have focused only on the encoder, decoder, or reconstructing parts of the text. Pre-training a complete model allows it to be directly fine tuned for supervised (both sentence-level and document-level) and unsupervised machine translation, with no task-specific modifications. We demonstrate that adding mBART initialization produces performance gains in all but the highest-resource settings, including up to 12 BLEU points for low resource MT and over 5 BLEU points for many document-level and unsupervised models. We also show it also enables new types of transfer to language pairs with no bi-text or that were not in the pre-training corpus, and present extensive analysis of which factors contribute the most to effective pre-training., Comment: Work in progress
- Published
- 2020
35. Unsupervised Cross-lingual Representation Learning at Scale
- Author
-
Conneau, Alexis, Khandelwal, Kartikay, Goyal, Naman, Chaudhary, Vishrav, Wenzek, Guillaume, Guzmán, Francisco, Grave, Edouard, Ott, Myle, Zettlemoyer, Luke, and Stoyanov, Veselin
- Subjects
Computer Science - Computation and Language - Abstract
This paper shows that pretraining multilingual language models at scale leads to significant performance gains for a wide range of cross-lingual transfer tasks. We train a Transformer-based masked language model on one hundred languages, using more than two terabytes of filtered CommonCrawl data. Our model, dubbed XLM-R, significantly outperforms multilingual BERT (mBERT) on a variety of cross-lingual benchmarks, including +14.6% average accuracy on XNLI, +13% average F1 score on MLQA, and +2.4% F1 score on NER. XLM-R performs particularly well on low-resource languages, improving 15.7% in XNLI accuracy for Swahili and 11.4% for Urdu over previous XLM models. We also present a detailed empirical analysis of the key factors that are required to achieve these gains, including the trade-offs between (1) positive transfer and capacity dilution and (2) the performance of high and low resource languages at scale. Finally, we show, for the first time, the possibility of multilingual modeling without sacrificing per-language performance; XLM-R is very competitive with strong monolingual models on the GLUE and XNLI benchmarks. We will make our code, data and models publicly available., Comment: ACL 2020 (+ updated results)
- Published
- 2019
36. BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
- Author
-
Lewis, Mike, Liu, Yinhan, Goyal, Naman, Ghazvininejad, Marjan, Mohamed, Abdelrahman, Levy, Omer, Stoyanov, Ves, and Zettlemoyer, Luke
- Subjects
Computer Science - Computation and Language ,Computer Science - Machine Learning ,Statistics - Machine Learning - Abstract
We present BART, a denoising autoencoder for pretraining sequence-to-sequence models. BART is trained by (1) corrupting text with an arbitrary noising function, and (2) learning a model to reconstruct the original text. It uses a standard Tranformer-based neural machine translation architecture which, despite its simplicity, can be seen as generalizing BERT (due to the bidirectional encoder), GPT (with the left-to-right decoder), and many other more recent pretraining schemes. We evaluate a number of noising approaches, finding the best performance by both randomly shuffling the order of the original sentences and using a novel in-filling scheme, where spans of text are replaced with a single mask token. BART is particularly effective when fine tuned for text generation but also works well for comprehension tasks. It matches the performance of RoBERTa with comparable training resources on GLUE and SQuAD, achieves new state-of-the-art results on a range of abstractive dialogue, question answering, and summarization tasks, with gains of up to 6 ROUGE. BART also provides a 1.1 BLEU increase over a back-translation system for machine translation, with only target language pretraining. We also report ablation experiments that replicate other pretraining schemes within the BART framework, to better measure which factors most influence end-task performance.
- Published
- 2019
37. RoBERTa: A Robustly Optimized BERT Pretraining Approach
- Author
-
Liu, Yinhan, Ott, Myle, Goyal, Naman, Du, Jingfei, Joshi, Mandar, Chen, Danqi, Levy, Omer, Lewis, Mike, Zettlemoyer, Luke, and Stoyanov, Veselin
- Subjects
Computer Science - Computation and Language - Abstract
Language model pretraining has led to significant performance gains but careful comparison between different approaches is challenging. Training is computationally expensive, often done on private datasets of different sizes, and, as we will show, hyperparameter choices have significant impact on the final results. We present a replication study of BERT pretraining (Devlin et al., 2019) that carefully measures the impact of many key hyperparameters and training data size. We find that BERT was significantly undertrained, and can match or exceed the performance of every model published after it. Our best model achieves state-of-the-art results on GLUE, RACE and SQuAD. These results highlight the importance of previously overlooked design choices, and raise questions about the source of recently reported improvements. We release our models and code.
- Published
- 2019
38. An Axiomatic Analysis for Object Detection and Recognition Using Deep Learning
- Author
-
Goyal, Naman, primary, Goraya, Major Singh, additional, and Singh, Tajinder, additional
- Published
- 2022
- Full Text
- View/download PDF
39. Financial feasibility of concentrated solar power with and without sensible heat storage in hot and dry Indian climate
- Author
-
Goyal, Naman, Aggarwal, Akshansh, and Kumar, Anil
- Published
- 2022
- Full Text
- View/download PDF
40. Concentrated solar power plants: A critical review of regional dynamics and operational parameters
- Author
-
Goyal, Naman, Aggarwal, Akshansh, and Kumar, Anil
- Published
- 2022
- Full Text
- View/download PDF
41. The Social Dynamics of Language Change in Online Networks
- Author
-
Goel, Rahul, Soni, Sandeep, Goyal, Naman, Paparrizos, John, Wallach, Hanna, Diaz, Fernando, and Eisenstein, Jacob
- Subjects
Computer Science - Computation and Language ,Computer Science - Social and Information Networks ,Physics - Physics and Society ,I.2.7 ,J.4 ,J.5 - Abstract
Language change is a complex social phenomenon, revealing pathways of communication and sociocultural influence. But, while language change has long been a topic of study in sociolinguistics, traditional linguistic research methods rely on circumstantial evidence, estimating the direction of change from differences between older and younger speakers. In this paper, we use a data set of several million Twitter users to track language changes in progress. First, we show that language change can be viewed as a form of social influence: we observe complex contagion for phonetic spellings and "netspeak" abbreviations (e.g., lol), but not for older dialect markers from spoken language. Next, we test whether specific types of social network connections are more influential than others, using a parametric Hawkes process model. We find that tie strength plays an important role: densely embedded social ties are significantly better conduits of linguistic influence. Geographic locality appears to play a more limited role: we find relatively little evidence to support the hypothesis that individuals are more influenced by geographically local social ties, even in their usage of geographical dialect markers., Comment: This paper appears in the Proceedings of the International Conference on Social Informatics (SocInfo16). The final publication is available at springer.com
- Published
- 2016
42. Thermal characteristics of sensible heat storage materials applicable for concentrated solar power systems
- Author
-
Aggarwal, Akshansh, Goyal, Naman, and Kumar, Anil
- Published
- 2021
- Full Text
- View/download PDF
43. Digest: Habitat and competition drive diversification in Gondwanan fish
- Author
-
Dhanda, Aavika, primary, Hubbard, Lillian, additional, and Goyal, Naman, additional
- Published
- 2024
- Full Text
- View/download PDF
44. Patterns in spatial segregation between the endemic Nicobar Bulbul ( Ixos nicobariensis ) and the introduced Red-whiskered Bulbul ( Pycnonotus jocosus ) on two islands of central Nicobar
- Author
-
Goyal, Naman, Singh, Pratap, K, Sivakumar, and Nair, Manoj V
- Published
- 2019
45. Revisiting the systematic position of the enigmatic Nicobar Bulbul (Ixos nicobariensis).
- Author
-
Goyal, Naman, Warudkar, Ashwin, Kolipakam, Vishnupriya, Pant, Bhawana, Kuppusamy, Sivakumar, Singh, Pratap, Qureshi, Qamar, and Nair, Manoj V.
- Subjects
BIOLOGICAL evolution ,ENDEMIC species ,ISLANDS - Abstract
The Nicobar Bulbul Ixos nicobariensis is an endemic species found across the central Nicobar Islands whose systematic position has been debated for many decades. Its current placement in the genus Ixos is tentative and is based on morphological similarities and geographical affinities to other members of the genus. We used mitochondrial and nuclear markers to examine its phylogenetic relationship to other members of Ixos and Hypsipetes. Our results indicate that the Nicobar Bulbul is sister to members of the genus Hypsipetes, and not Ixos. We further discuss the origin of this species based on the results of our study. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
46. Revisiting the systematic position of the enigmatic Nicobar Bulbul (Ixos nicobariensis)
- Author
-
Goyal, Naman, primary, Warudkar, Ashwin, additional, Kolipakam, Vishnupriya, additional, Pant, Bhawana, additional, Kuppusamy, Sivakumar, additional, Singh, Pratap, additional, Qureshi, Qamar, additional, and Nair, Manoj V., additional
- Published
- 2023
- Full Text
- View/download PDF
47. Content Credibility Check on Twitter
- Author
-
Gupta, Priya, Pathak, Vihaan, Goyal, Naman, Singh, Jaskirat, Varshney, Vibhu, Kumar, Sunil, Barbosa, Simone Diniz Junqueira, Series Editor, Filipe, Joaquim, Series Editor, Kotenko, Igor, Series Editor, Sivalingam, Krishna M., Series Editor, Washio, Takashi, Series Editor, Yuan, Junsong, Series Editor, Zhou, Lizhu, Series Editor, Deka, Ganesh Chandra, editor, Kaiwartya, Omprakash, editor, Vashisth, Pooja, editor, and Rathee, Priyanka, editor
- Published
- 2018
- Full Text
- View/download PDF
48. Nicobar Bulbul (Ixos nicobariensis)
- Author
-
Goyal, Naman, primary
- Published
- 2023
- Full Text
- View/download PDF
49. Tarsier Goggles: a virtual reality tool for experiencing the optics of a dark-adapted primate visual system
- Author
-
Gochman, Samuel R., Morano Lord, Marilyn, Goyal, Naman, Chow, Kristie, Cooper, Benjamin K., Gray, Lauren K., Guo, Stephanie X., Hill, Kylie A., Liao, Stephen K., Peng, Shiyao, Seong, Hyun J., Wang, Alma, Yoon, Eun K., Zhang, Shirley, Lobel, Erica, Tregubov, Tim, and Dominy, Nathaniel J.
- Published
- 2019
- Full Text
- View/download PDF
50. Global Metabolomic Profiling of Host Red Blood Cells Infected with Babesia divergens Reveals Novel Antiparasitic Target Pathways
- Author
-
Beri, Divya, primary, Singh, Manpreet, additional, Rodriguez, Marilis, additional, Goyal, Naman, additional, Rasquinha, Giselle, additional, Liu, Yunfeng, additional, An, Xiuli, additional, Yazdanbakhsh, Karina, additional, and Lobo, Cheryl A., additional
- Published
- 2023
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.