9 results on '"Huizinga, Joost"'
Search Results
2. First return, then explore
- Author
-
Ecoffet, Adrien, Huizinga, Joost, Lehman, Joel, Stanley, Kenneth O., and Clune, Jeff
- Subjects
Data mining -- Usage ,Reinforcement learning (Machine learning) -- Usage ,Algorithms -- Usage ,Data warehousing/data mining ,Algorithm ,Environmental issues ,Science and technology ,Zoology and wildlife conservation - Abstract
Reinforcement learning promises to solve complex sequential-decision problems autonomously by specifying a high-level reward function only. However, reinforcement learning algorithms struggle when, as is often the case, simple and intuitive rewards provide sparse.sup.1 and deceptive.sup.2 feedback. Avoiding these pitfalls requires a thorough exploration of the environment, but creating algorithms that can do so remains one of the central challenges of the field. Here we hypothesize that the main impediment to effective exploration originates from algorithms forgetting how to reach previously visited states (detachment) and failing to first return to a state before exploring from it (derailment). We introduce Go-Explore, a family of algorithms that addresses these two challenges directly through the simple principles of explicitly 'remembering' promising states and returning to such states before intentionally exploring. Go-Explore solves all previously unsolved Atari games and surpasses the state of the art on all hard-exploration games.sup.1, with orders-of-magnitude improvements on the grand challenges of Montezuma's Revenge and Pitfall. We also demonstrate the practical potential of Go-Explore on a sparse-reward pick-and-place robotics task. Additionally, we show that adding a goal-conditioned policy can further improve Go-Explore's exploration efficiency and enable it to handle stochasticity throughout training. The substantial performance gains from Go-Explore suggest that the simple principles of remembering states, returning to them, and exploring from them are a powerful and general approach to exploration--an insight that may prove critical to the creation of truly intelligent learning agents. A reinforcement learning algorithm that explicitly remembers promising states and returns to them as a basis for further exploration solves all as-yet-unsolved Atari games and out-performs previous algorithms on Montezuma's Revenge and Pitfall., Author(s): Adrien Ecoffet [sup.1] [sup.2] , Joost Huizinga [sup.1] [sup.2] , Joel Lehman [sup.1] [sup.2] , Kenneth O. Stanley [sup.1] [sup.2] , Jeff Clune [sup.1] [sup.2] Author Affiliations: (1) Uber [...]
- Published
- 2021
- Full Text
- View/download PDF
3. Video PreTraining (VPT): Learning to Act by Watching Unlabeled Online Videos
- Author
-
Baker, Bowen, Akkaya, Ilge, Zhokhov, Peter, Huizinga, Joost, Tang, Jie, Ecoffet, Adrien, Houghton, Brandon, Sampedro, Raul, and Clune, Jeff
- Subjects
FOS: Computer and information sciences ,Computer Science - Machine Learning ,Artificial Intelligence (cs.AI) ,Computer Science - Artificial Intelligence ,Machine Learning (cs.LG) - Abstract
Pretraining on noisy, internet-scale datasets has been heavily studied as a technique for training models with broad, general capabilities for text, images, and other modalities. However, for many sequential decision domains such as robotics, video games, and computer use, publicly available data does not contain the labels required to train behavioral priors in the same way. We extend the internet-scale pretraining paradigm to sequential decision domains through semi-supervised imitation learning wherein agents learn to act by watching online unlabeled videos. Specifically, we show that with a small amount of labeled data we can train an inverse dynamics model accurate enough to label a huge unlabeled source of online data -- here, online videos of people playing Minecraft -- from which we can then train a general behavioral prior. Despite using the native human interface (mouse and keyboard at 20Hz), we show that this behavioral prior has nontrivial zero-shot capabilities and that it can be fine-tuned, with both imitation learning and reinforcement learning, to hard-exploration tasks that are impossible to learn from scratch via reinforcement learning. For many tasks our models exhibit human-level performance, and we are the first to report computer agents that can craft diamond tools, which can take proficient humans upwards of 20 minutes (24,000 environment actions) of gameplay to accomplish.
- Published
- 2022
4. Evolving Multimodal Robot Behavior via Many Stepping Stones with the Combinatorial Multiobjective Evolutionary Algorithm.
- Author
-
Huizinga, Joost and Clune, Jeff
- Subjects
- *
ROBOTS , *MULTIMODAL user interfaces , *PROBLEM solving , *REINFORCEMENT learning , *SIMULATED annealing - Abstract
An important challenge in reinforcement learning is to solve multimodal problems, where agents have to act in qualitatively different ways depending on the circumstances. Because multimodal problems are often too difficult to solve directly, it is often helpful to define a curriculum, which is an ordered set of subtasks that can serve as the stepping stones for solving the overall problem. Unfortunately, choosing an effective ordering for these subtasks is difficult, and a poor ordering can reduce the performance of the learning process. Here, we provide a thorough introduction and investigation of the Combinatorial Multiobjective Evolutionary Algorithm (CMOEA), which allows all combinations of subtasks to be explored simultaneously. We compare CMOEA against three algorithms that can similarly optimize on multiple subtasks simultaneously: NSGA-II, NSGA-III, and ε -Lexicase Selection. The algorithms are tested on a function-optimization problem with two subtasks, a simulated multimodal robot locomotion problem with six subtasks, and a simulated robot maze-navigation problem where a hundred random mazes are treated as subtasks. On these problems, CMOEA either outperforms or is competitive with the controls. As a separate contribution, we show that adding a linear combination over all objectives can improve the ability of the control algorithms to solve these multimodal problems. Lastly, we show that CMOEA can leverage auxiliary objectives more effectively than the controls on the multimodal locomotion task. In general, our experiments suggest that CMOEA is a promising algorithm for solving multimodal problems. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
5. Guiding Neuroevolution with Structural Objectives.
- Author
-
Ellefsen, Kai Olav, Huizinga, Joost, and Torresen, Jim
- Subjects
- *
EVOLUTIONARY algorithms , *MODULAR forms , *MATHEMATICAL decomposition , *NETWORK performance - Abstract
The structure and performance of neural networks are intimately connected, and by use of evolutionary algorithms, neural network structures optimally adapted to a given task can be explored. Guiding such neuroevolution with additional objectives related to network structure has been shown to improve performance in some cases, especially when modular neural networks are beneficial. However, apart from objectives aiming to make networks more modular, such structural objectives have not been widely explored. We propose two new structural objectives and test their ability to guide evolving neural networks on two problems which can benefit from decomposition into subtasks. The first structural objective guides evolution to align neural networks with a user-recommended decomposition pattern. Intuitively, this should be a powerful guiding target for problems where human users can easily identify a structure. The second structural objective guides evolution towards a population with a high diversity in decomposition patterns. This results in exploration of many different ways to decompose a problem, allowing evolution to find good decompositions faster. Tests on our target problems reveal that both methods perform well on a problem with a very clear and decomposable structure. However, on a problem where the optimal decomposition is less obvious, the structural diversity objective is found to outcompete other structural objectives—and this technique can even increase performance on problems without any decomposable structure at all. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
6. The Emergence of Canalization and Evolvability in an Open-Ended, Interactive Evolutionary System.
- Author
-
Huizinga, Joost, Stanley, Kenneth O., and Clune, Jeff
- Subjects
- *
EVOLUTIONARY computation , *EVOLUTIONARY algorithms , *GENETIC engineering , *ORGANISMS , *EVOLUTIONARY theories - Abstract
Many believe that an essential component for the discovery of the tremendous diversity in natural organisms was the evolution of evolvability, whereby evolution speeds up its ability to innovate by generating a more adaptive pool of offspring. One hypothesized mechanism for evolvability is developmental canalization, wherein certain dimensions of variation become more likely to be traversed and others are prevented from being explored (e.g., offspring tend to have similar-size legs, and mutations affect the length of both legs, not each leg individually). While ubiquitous in nature, canalization is rarely reported in computational simulations of evolution, which deprives us of in silico examples of canalization to study and raises the question of which conditions give rise to this form of evolvability. Answering this question would shed light on why such evolvability emerged naturally, and it could accelerate engineering efforts to harness evolution to solve important engineering challenges. In this article, we reveal a unique system in which canalization did emerge in computational evolution. We document that genomes entrench certain dimensions of variation that were frequently explored during their evolutionary history. The genetic representation of these organisms also evolved to be more modular and hierarchical than expected by chance, and we show that these organizational properties correlate with increased fitness. Interestingly, the type of computational evolutionary experiment that produced this evolvability was very different from traditional digital evolution in that there was no objective, suggesting that open-ended, divergent evolutionary processes may be necessary for the evolution of evolvability. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
7. The Evolutionary Origins of Hierarchy.
- Author
-
Mengistu, Henok, Huizinga, Joost, Mouret, Jean-Baptiste, and Clune, Jeff
- Subjects
- *
BIOLOGICAL evolution , *BIOLOGICAL adaptation , *HIERARCHICAL clustering (Cluster analysis) , *ARTIFICIAL intelligence , *ROBOTICS - Abstract
Hierarchical organization—the recursive composition of sub-modules—is ubiquitous in biological networks, including neural, metabolic, ecological, and genetic regulatory networks, and in human-made systems, such as large organizations and the Internet. To date, most research on hierarchy in networks has been limited to quantifying this property. However, an open, important question in evolutionary biology is why hierarchical organization evolves in the first place. It has recently been shown that modularity evolves because of the presence of a cost for network connections. Here we investigate whether such connection costs also tend to cause a hierarchical organization of such modules. In computational simulations, we find that networks without a connection cost do not evolve to be hierarchical, even when the task has a hierarchical structure. However, with a connection cost, networks evolve to be both modular and hierarchical, and these networks exhibit higher overall performance and evolvability (i.e. faster adaptation to new environments). Additional analyses confirm that hierarchy independently improves adaptability after controlling for modularity. Overall, our results suggest that the same force–the cost of connections–promotes the evolution of both hierarchy and modularity, and that these properties are important drivers of network performance and adaptability. In addition to shedding light on the emergence of hierarchy across the many domains in which it appears, these findings will also accelerate future research into evolving more complex, intelligent computational brains in the fields of artificial intelligence and robotics. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
8. Evolving neural networks that are both modular and regular.
- Author
-
Huizinga, Joost, Clune, Jeff, and Mouret, Jean-Baptiste
- Published
- 2014
- Full Text
- View/download PDF
9. Environmental, individual and social traits of free-ranging raccoons influence performance in cognitive testing.
- Author
-
Stanton LA, Bridge ES, Huizinga J, and Benson-Amram S
- Subjects
- Animals, Neuropsychological Tests, Reversal Learning, Sociological Factors, Animals, Wild, Raccoons psychology
- Abstract
Cognitive abilities, such as learning and flexibility, are hypothesized to aid behavioral adaptation to urbanization. Although growing evidence suggests that cognition may indeed facilitate persistence in urban environments, we currently lack knowledge of the cognitive abilities of many urban taxa. Recent methodological advances, including radio frequency identification (RFID), have extended automated cognitive testing into the field but have yet to be applied to a diversity of taxa. Here, we used an RFID-enabled operant conditioning device to assess the habituation, learning and cognitive flexibility of a wild population of raccoons (Procyon lotor). We examined how several biological and behavioral traits influenced participation and performance in testing. We then compared the cognitive performance of wild raccoons tested in natural conditions with that of wild-caught raccoons tested in captivity from a previous study. In natural conditions, juvenile raccoons were more likely to habituate to the testing device, but performed worse in serial reversal learning, compared with adults. We also found that docile raccoons were more likely to learn how to operate the device in natural conditions, which suggests a relationship between emotional reactivity and cognitive ability in raccoons. Although raccoons in both captive and natural conditions demonstrated rapid associative learning and flexibility, raccoons in captive conditions generally performed better, likely owing to the heightened vigilance and social interference experienced by raccoons in natural conditions. Our results have important implications for future research on urban carnivores and cognition in field settings, as well as our understanding of behavioral adaptation to urbanization and coexistence with urban wildlife., Competing Interests: Competing interests The authors declare no competing or financial interests., (© 2022. Published by The Company of Biologists Ltd.)
- Published
- 2022
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.