Author: "Mella, Vegard" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Mella, Vegard"' showing total 13 results

Start Over Author "Mella, Vegard"

13 results on '"Mella, Vegard"'

1. RLEF: Grounding Code LLMs in Execution Feedback with Reinforcement Learning

Author: Gehring, Jonas, Zheng, Kunhao, Copet, Jade, Mella, Vegard, Carbonneaux, Quentin, Cohen, Taco, and Synnaeve, Gabriel
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Large language models (LLMs) deployed as agents solve user-specified tasks over multiple steps while keeping the required manual engagement to a minimum. Crucially, such LLMs need to ground their generations in any feedback obtained to reliably achieve the desired outcomes. We propose an end-to-end reinforcement learning method for teaching models to leverage execution feedback in the realm of code synthesis, where state-of-the-art LLMs struggle to improve code iteratively compared to independent sampling. We benchmark on competitive programming tasks, where we achieve new state-of-the art results with both small (8B parameters) and large (70B) models while reducing the amount of samples required by an order of magnitude. Our analysis of inference-time behavior demonstrates that our method produces LLMs that effectively leverage automatic feedback over multiple steps., Comment: Add repair model ablation, update related work
Published: 2024

2. Dungeons and Data: A Large-Scale NetHack Dataset

Author: Hambro, Eric, Raileanu, Roberta, Rothermel, Danielle, Mella, Vegard, Rocktäschel, Tim, Küttler, Heinrich, and Murray, Naila
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence
Abstract: Recent breakthroughs in the development of agents to solve challenging sequential decision making problems such as Go, StarCraft, or DOTA, have relied on both simulated environments and large-scale datasets. However, progress on this research has been hindered by the scarcity of open-sourced datasets and the prohibitive computational cost to work with them. Here we present the NetHack Learning Dataset (NLD), a large and highly-scalable dataset of trajectories from the popular game of NetHack, which is both extremely challenging for current methods and very fast to run. NLD consists of three parts: 10 billion state transitions from 1.5 million human trajectories collected on the NAO public NetHack server from 2009 to 2020; 3 billion state-action-score transitions from 100,000 trajectories collected from the symbolic bot winner of the NetHack Challenge 2021; and, accompanying code for users to record, load and stream any collection of such trajectories in a highly compressed form. We evaluate a wide range of existing algorithms including online and offline RL, as well as learning from demonstrations, showing that significant research advances are needed to fully leverage large-scale datasets for challenging sequential decision making tasks., Comment: 9 pages, published in the Proceedings of the 36th Conference on Neural Information Processing Systems (NeurIPS 2022) Track on Datasets and Benchmarks. New links to hosting location. Revised results, same conclusions
Published: 2022

3. Insights From the NeurIPS 2021 NetHack Challenge

Author: Hambro, Eric, Mohanty, Sharada, Babaev, Dmitrii, Byeon, Minwoo, Chakraborty, Dipam, Grefenstette, Edward, Jiang, Minqi, Jo, Daejin, Kanervisto, Anssi, Kim, Jongmin, Kim, Sungwoong, Kirk, Robert, Kurin, Vitaly, Küttler, Heinrich, Kwon, Taehwon, Lee, Donghoon, Mella, Vegard, Nardelli, Nantas, Nazarov, Ivan, Ovsov, Nikita, Parker-Holder, Jack, Raileanu, Roberta, Ramanauskas, Karolis, Rocktäschel, Tim, Rothermel, Danielle, Samvelyan, Mikayel, Sorokin, Dmitry, Sypetkowski, Maciej, and Sypetkowski, Michał
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Neural and Evolutionary Computing, Computer Science - Symbolic Computation, Statistics - Machine Learning
Abstract: In this report, we summarize the takeaways from the first NeurIPS 2021 NetHack Challenge. Participants were tasked with developing a program or agent that can win (i.e., 'ascend' in) the popular dungeon-crawler game of NetHack by interacting with the NetHack Learning Environment (NLE), a scalable, procedurally generated, and challenging Gym environment for reinforcement learning (RL). The challenge showcased community-driven progress in AI with many diverse approaches significantly beating the previously best results on NetHack. Furthermore, it served as a direct comparison between neural (e.g., deep RL) and symbolic AI, as well as hybrid systems, demonstrating that on NetHack symbolic bots currently outperform deep RL by a large margin. Lastly, no agent got close to winning the game, illustrating NetHack's suitability as a long-term benchmark for AI research., Comment: Under review at PMLR for the NeuRIPS 2021 Competition Workshop Track, 10 pages + 10 in appendices
Published: 2022

4. Transfer of Fully Convolutional Policy-Value Networks Between Games and Game Variants

Author: Soemers, Dennis J. N. J., Mella, Vegard, Piette, Eric, Stephenson, Matthew, Browne, Cameron, and Teytaud, Olivier
Subjects: Computer Science - Machine Learning
Abstract: In this paper, we use fully convolutional architectures in AlphaZero-like self-play training setups to facilitate transfer between variants of board games as well as distinct games. We explore how to transfer trained parameters of these architectures based on shared semantics of channels in the state and action representations of the Ludii general game system. We use Ludii's large library of games and game variants for extensive transfer learning evaluations, in zero-shot transfer experiments as well as experiments with additional fine-tuning time.
Published: 2021

5. Deep Learning for General Game Playing with Ludii and Polygames

Author: Soemers, Dennis J. N. J., Mella, Vegard, Browne, Cameron, and Teytaud, Olivier
Subjects: Computer Science - Artificial Intelligence
Abstract: Combinations of Monte-Carlo tree search and Deep Neural Networks, trained through self-play, have produced state-of-the-art results for automated game-playing in many board games. The training and search algorithms are not game-specific, but every individual game that these approaches are applied to still requires domain knowledge for the implementation of the game's rules, and constructing the neural network's architecture -- in particular the shapes of its input and output tensors. Ludii is a general game system that already contains over 500 different games, which can rapidly grow thanks to its powerful and user-friendly game description language. Polygames is a framework with training and search algorithms, which has already produced superhuman players for several board games. This paper describes the implementation of a bridge between Ludii and Polygames, which enables Polygames to train and evaluate models for games that are implemented and run through Ludii. We do not require any game-specific domain knowledge anymore, and instead leverage our domain knowledge of the Ludii system and its abstract state and move representations to write functions that can automatically determine the appropriate shapes for input and output tensors for any game implemented in Ludii. We describe experimental results for short training runs in a wide variety of different board games, and discuss several open problems and avenues for future research.
Published: 2021

6. Polygames: Improved Zero Learning

Author: Cazenave, Tristan, Chen, Yen-Chi, Chen, Guan-Wei, Chen, Shi-Yu, Chiu, Xian-Dong, Dehos, Julien, Elsa, Maria, Gong, Qucheng, Hu, Hengyuan, Khalidov, Vasil, Li, Cheng-Ling, Lin, Hsin-I, Lin, Yu-Jin, Martinet, Xavier, Mella, Vegard, Rapin, Jeremy, Roziere, Baptiste, Synnaeve, Gabriel, Teytaud, Fabien, Teytaud, Olivier, Ye, Shi-Cheng, Ye, Yi-Jun, Yen, Shi-Jim, and Zagoruyko, Sergey
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: Since DeepMind's AlphaZero, Zero learning quickly became the state-of-the-art method for many board games. It can be improved using a fully convolutional structure (no fully connected layer). Using such an architecture plus global pooling, we can create bots independent of the board size. The training can be made more robust by keeping track of the best checkpoints during the training and by training against them. Using these features, we release Polygames, our framework for Zero learning, with its library of games and its checkpoints. We won against strong humans at the game of Hex in 19x19, which was often said to be untractable for zero learning; and in Havannah. We also won several first places at the TAAI competitions.
Published: 2020

7. Forward Modeling for Partial Observation Strategy Games - A StarCraft Defogger

Author: Synnaeve, Gabriel, Lin, Zeming, Gehring, Jonas, Gant, Dan, Mella, Vegard, Khalidov, Vasil, Carion, Nicolas, and Usunier, Nicolas
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence
Abstract: We formulate the problem of defogging as state estimation and future state prediction from previous, partial observations in the context of real-time strategy games. We propose to employ encoder-decoder neural networks for this task, and introduce proxy tasks and baselines for evaluation to assess their ability of capturing basic game rules and high-level dynamics. By combining convolutional neural networks and recurrent networks, we exploit spatial and sequential correlations and train well-performing models on a large dataset of human games of StarCraft: Brood War. Finally, we demonstrate the relevance of our models to downstream tasks by applying them for enemy unit prediction in a state-of-the-art, rule-based StarCraft bot. We observe improvements in win rates against several strong community bots.
Published: 2018

8. High-Level Strategy Selection under Partial Observability in StarCraft: Brood War

Author: Gehring, Jonas, Ju, Da, Mella, Vegard, Gant, Daniel, Usunier, Nicolas, and Synnaeve, Gabriel
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: We consider the problem of high-level strategy selection in the adversarial setting of real-time strategy games from a reinforcement learning perspective, where taking an action corresponds to switching to the respective strategy. Here, a good strategy successfully counters the opponent's current and possible future strategies which can only be estimated using partial observations. We investigate whether we can utilize the full game state information during training time (in the form of an auxiliary prediction task) to increase performance. Experiments carried out within a StarCraft: Brood War bot against strong community bots show substantial win rate improvements over a fixed-strategy baseline and encouraging results when learning with the auxiliary task.
Published: 2018

9. Transfer of Fully Convolutional Policy-Value Networks Between Games and Game Variants

Author: Soemers, Dennis J. N. J., Mella, Vegard, Piette, Eric, Stephenson, Matthew, Browne, Cameron, Teytaud, Olivier, and Piette, Eric
Subjects: FOS: Computer and information sciences, [INFO.INFO-AI] Computer Science [cs]/Artificial Intelligence [cs.AI], Computer Science - Machine Learning, ComputingMilieux_PERSONALCOMPUTING, [INFO] Computer Science [cs], Machine Learning (cs.LG)
Abstract: In this paper, we use fully convolutional architectures in AlphaZero-like self-play training setups to facilitate transfer between variants of board games as well as distinct games. We explore how to transfer trained parameters of these architectures based on shared semantics of channels in the state and action representations of the Ludii general game system. We use Ludii's large library of games and game variants for extensive transfer learning evaluations, in zero-shot transfer experiments as well as experiments with additional fine-tuning time.
Published: 2022

10. Deep learning for general game playing with Ludii and Polygames

Author: S, Dennis J.N.J., primary, Mella, Vegard, additional, Browne, Cameron, additional, and Teytaud, Olivier, additional
Published: 2022
Full Text: View/download PDF

11. Polygames: Improved zero learning

Author: Cazenave, Tristan, primary, Chen, Yen-Chi, additional, Chen, Guan-Wei, additional, Chen, Shi-Yu, additional, Chiu, Xian-Dong, additional, Dehos, Julien, additional, Elsa, Maria, additional, Gong, Qucheng, additional, Hu, Hengyuan, additional, Khalidov, Vasil, additional, Li, Cheng-Ling, additional, Lin, Hsin-I, additional, Lin, Yu-Jin, additional, Martinet, Xavier, additional, Mella, Vegard, additional, Rapin, Jeremy, additional, Roziere, Baptiste, additional, Synnaeve, Gabriel, additional, Teytaud, Fabien, additional, Teytaud, Olivier, additional, Ye, Shi-Cheng, additional, Ye, Yi-Jun, additional, Yen, Shi-Jim, additional, and Zagoruyko, Sergey, additional
Published: 2021
Full Text: View/download PDF

12. Deep learning for general game playing with Ludii and Polygames.

Author: Soemers, Dennis J.N.J., Mella, Vegard, Browne, Cameron, and Teytaud, Olivier
Subjects: *DEEP learning, *MONTE Carlo method, *ARTIFICIAL neural networks, *BOARD games, *GAMES
Abstract: Combinations of Monte-Carlo tree search and Deep Neural Networks, trained through self-play, have produced state-of-the-art results for automated game-playing in many board games. The training and search algorithms are not game-specific, but every individual game that these approaches are applied to still requires domain knowledge for the implementation of the game's rules, and constructing the neural network's architecture – in particular the shapes of its input and output tensors. Ludii is a general game system that already contains over 1,000 different games, which can rapidly grow thanks to its powerful and user-friendly game description language. Polygames is a framework with training and search algorithms, which has already produced superhuman players for several board games. This paper describes the implementation of a bridge between Ludii and Polygames, which enables Polygames to train and evaluate models for games that are implemented and run through Ludii. We do not require any game-specific domain knowledge anymore, and instead leverage our domain knowledge of the Ludii system and its abstract state and move representations to write functions that can automatically determine the appropriate shapes for input and output tensors for any game implemented in Ludii. We describe experimental results for short training runs in a wide variety of different board games, and discuss several open problems and avenues for future research. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

13. Deep learning for general game playing with Ludii and Polygames

Author: Soemers, Dennis J.N.J., Mella, Vegard, Browne, Cameron, and Teytaud, Olivier
Abstract: Combinations of Monte-Carlo tree search and Deep Neural Networks, trained through self-play, have produced state-of-the-art results for automated game-playing in many board games. The training and search algorithms are not game-specific, but every individual game that these approaches are applied to still requires domain knowledge for the implementation of the game’s rules, and constructing the neural network’s architecture – in particular the shapes of its input and output tensors. Ludii is a general game system that already contains over 1,000 different games, which can rapidly grow thanks to its powerful and user-friendly game description language. Polygames is a framework with training and search algorithms, which has already produced superhuman players for several board games. This paper describes the implementation of a bridge between Ludii and Polygames, which enables Polygames to train and evaluate models for games that are implemented and run through Ludii. We do not require any game-specific domain knowledge anymore, and instead leverage our domain knowledge of the Ludii system and its abstract state and move representations to write functions that can automatically determine the appropriate shapes for input and output tensors for any game implemented in Ludii. We describe experimental results for short training runs in a wide variety of different board games, and discuss several open problems and avenues for future research.
Published: 2022
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

13 results on '"Mella, Vegard"'

1. RLEF: Grounding Code LLMs in Execution Feedback with Reinforcement Learning

2. Dungeons and Data: A Large-Scale NetHack Dataset

3. Insights From the NeurIPS 2021 NetHack Challenge

4. Transfer of Fully Convolutional Policy-Value Networks Between Games and Game Variants

5. Deep Learning for General Game Playing with Ludii and Polygames

6. Polygames: Improved Zero Learning

7. Forward Modeling for Partial Observation Strategy Games - A StarCraft Defogger

8. High-Level Strategy Selection under Partial Observability in StarCraft: Brood War

9. Transfer of Fully Convolutional Policy-Value Networks Between Games and Game Variants

10. Deep learning for general game playing with Ludii and Polygames

11. Polygames: Improved zero learning

12. Deep learning for general game playing with Ludii and Polygames.

13. Deep learning for general game playing with Ludii and Polygames

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

13 results on '"Mella, Vegard"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources