1,249 results on '"Stefano, V."'
Search Results
2. Agent-Temporal Credit Assignment for Optimal Policy Preservation in Sparse Multi-Agent Reinforcement Learning
- Author
-
Kapoor, Aditya, Swamy, Sushant, Tessera, Kale-ab, Baranwal, Mayank, Sun, Mingfei, Khadilkar, Harshad, and Albrecht, Stefano V.
- Subjects
Computer Science - Multiagent Systems ,Computer Science - Artificial Intelligence ,Computer Science - Computer Science and Game Theory ,Computer Science - Machine Learning ,Computer Science - Robotics - Abstract
In multi-agent environments, agents often struggle to learn optimal policies due to sparse or delayed global rewards, particularly in long-horizon tasks where it is challenging to evaluate actions at intermediate time steps. We introduce Temporal-Agent Reward Redistribution (TAR$^2$), a novel approach designed to address the agent-temporal credit assignment problem by redistributing sparse rewards both temporally and across agents. TAR$^2$ decomposes sparse global rewards into time-step-specific rewards and calculates agent-specific contributions to these rewards. We theoretically prove that TAR$^2$ is equivalent to potential-based reward shaping, ensuring that the optimal policy remains unchanged. Empirical results demonstrate that TAR$^2$ stabilizes and accelerates the learning process. Additionally, we show that when TAR$^2$ is integrated with single-agent reinforcement learning algorithms, it performs as well as or better than traditional multi-agent reinforcement learning methods., Comment: 12 pages, 1 figure
- Published
- 2024
3. HyperMARL: Adaptive Hypernetworks for Multi-Agent RL
- Author
-
Tessera, Kale-ab Abebe, Rahman, Arrasy, and Albrecht, Stefano V.
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence ,Computer Science - Multiagent Systems - Abstract
Adaptability is critical in cooperative multi-agent reinforcement learning (MARL), where agents must learn specialised or homogeneous behaviours for diverse tasks. While parameter sharing methods are sample-efficient, they often encounter gradient interference among agents, limiting their behavioural diversity. Conversely, non-parameter sharing approaches enable specialisation, but are computationally demanding and sample-inefficient. To address these issues, we propose HyperMARL, a parameter sharing approach that uses hypernetworks to dynamically generate agent-specific actor and critic parameters, without altering the learning objective or requiring preset diversity levels. By decoupling observation- and agent-conditioned gradients, HyperMARL empirically reduces policy gradient variance and facilitates specialisation within FuPS, suggesting it can mitigate cross-agent interference. Across multiple MARL benchmarks involving up to twenty agents -- and requiring homogeneous, heterogeneous, or mixed behaviours -- HyperMARL consistently performs competitively with fully shared, non-parameter-sharing, and diversity-promoting baselines, all while preserving a behavioural diversity level comparable to non-parameter sharing. These findings establish hypernetworks as a versatile approach for MARL across diverse environments.
- Published
- 2024
4. Skill-aware Mutual Information Optimisation for Generalisation in Reinforcement Learning
- Author
-
Yu, Xuehui, Dunion, Mhairi, Li, Xin, and Albrecht, Stefano V.
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence ,Computer Science - Robotics - Abstract
Meta-Reinforcement Learning (Meta-RL) agents can struggle to operate across tasks with varying environmental features that require different optimal skills (i.e., different modes of behaviour). Using context encoders based on contrastive learning to enhance the generalisability of Meta-RL agents is now widely studied but faces challenges such as the requirement for a large sample size, also referred to as the $\log$-$K$ curse. To improve RL generalisation to different tasks, we first introduce Skill-aware Mutual Information (SaMI), an optimisation objective that aids in distinguishing context embeddings according to skills, thereby equipping RL agents with the ability to identify and execute different skills across tasks. We then propose Skill-aware Noise Contrastive Estimation (SaNCE), a $K$-sample estimator used to optimise the SaMI objective. We provide a framework for equipping an RL agent with SaNCE in practice and conduct experimental validation on modified MuJoCo and Panda-gym benchmarks. We empirically find that RL agents that learn by maximising SaMI achieve substantially improved zero-shot generalisation to unseen tasks. Additionally, the context encoder trained with SaNCE demonstrates greater robustness to a reduction in the number of available samples, thus possessing the potential to overcome the $\log$-$K$ curse., Comment: The Thirty-eighth Annual Conference on Neural Information Processing Systems (NeurIPS), 2024
- Published
- 2024
5. Highway Graph to Accelerate Reinforcement Learning
- Author
-
Yin, Zidu, Zhang, Zhen, Gong, Dong, Albrecht, Stefano V., and Shi, Javen Q.
- Subjects
Computer Science - Machine Learning - Abstract
Reinforcement Learning (RL) algorithms often struggle with low training efficiency. A common approach to address this challenge is integrating model-based planning algorithms, such as Monte Carlo Tree Search (MCTS) or Value Iteration (VI), into the environmental model. However, VI requires iterating over a large tensor which updates the value of the preceding state based on the succeeding state through value propagation, resulting in computationally intensive operations. To enhance the RL training efficiency, we propose improving the efficiency of the value learning process. In deterministic environments with discrete state and action spaces, we observe that on the sampled empirical state-transition graph, a non-branching sequence of transitions-termed a highway-can take the agent to another state without deviation through intermediate states. On these non-branching highways, the value-updating process can be streamlined into a single-step operation, eliminating the need for step-by-step updates. Building on this observation, we introduce the highway graph to model state transitions. The highway graph compresses the transition model into a compact representation, where edges can encapsulate multiple state transitions, enabling value propagation across multiple time steps in a single iteration. By integrating the highway graph into RL, the training process is significantly accelerated, particularly in the early stages of training. Experiments across four categories of environments demonstrate that our method learns significantly faster than established and state-of-the-art RL algorithms (often by a factor of 10 to 150) while maintaining equal or superior expected returns. Furthermore, a deep neural network-based agent trained using the highway graph exhibits improved generalization capabilities and reduced storage costs. Code is publicly available at https://github.com/coodest/highwayRL., Comment: Published in TMLR
- Published
- 2024
6. Multi-Agent Reinforcement Learning for Energy Networks: Computational Challenges, Progress and Open Problems
- Author
-
Keren, Sarah, Essayeh, Chaimaa, Albrecht, Stefano V., and Morstyn, Thomas
- Subjects
Computer Science - Artificial Intelligence - Abstract
The rapidly changing architecture and functionality of electrical networks and the increasing penetration of renewable and distributed energy resources have resulted in various technological and managerial challenges. These have rendered traditional centralized energy-market paradigms insufficient due to their inability to support the dynamic and evolving nature of the network. This survey explores how multi-agent reinforcement learning (MARL) can support the decentralization and decarbonization of energy networks and mitigate the associated challenges. This is achieved by specifying key computational challenges in managing energy networks, reviewing recent research progress on addressing them, and highlighting open challenges that may be addressed using MARL.
- Published
- 2024
7. LLM-Personalize: Aligning LLM Planners with Human Preferences via Reinforced Self-Training for Housekeeping Robots
- Author
-
Han, Dongge, McInroe, Trevor, Jelley, Adam, Albrecht, Stefano V., Bell, Peter, and Storkey, Amos
- Subjects
Computer Science - Robotics ,Computer Science - Artificial Intelligence - Abstract
Large language models (LLMs) have shown significant potential for robotics applications, particularly task planning, by harnessing their language comprehension and text generation capabilities. However, in applications such as household robotics, a critical gap remains in the personalization of these models to individual user preferences. We introduce LLM-Personalize, a novel framework with an optimization pipeline designed to personalize LLM planners for household robotics. Our LLM-Personalize framework features an LLM planner that performs iterative planning in multi-room, partially-observable household scenarios, making use of a scene graph constructed with local observations. The generated plan consists of a sequence of high-level actions which are subsequently executed by a controller. Central to our approach is the optimization pipeline, which combines imitation learning and iterative self-training to personalize the LLM planner. In particular, the imitation learning phase performs initial LLM alignment from demonstrations, and bootstraps the model to facilitate effective iterative self-training, which further explores and aligns the model to user preferences. We evaluate LLM-Personalize on Housekeep, a challenging simulated real-world 3D benchmark for household rearrangements, and show that LLM-Personalize achieves more than a 30 percent increase in success rate over existing LLM planners, showcasing significantly improved alignment with human preferences. Project page: https://gdg94.github.io/projectllmpersonalize/., Comment: COLING 2025
- Published
- 2024
8. Multi-view Disentanglement for Reinforcement Learning with Multiple Cameras
- Author
-
Dunion, Mhairi and Albrecht, Stefano V.
- Subjects
Computer Science - Machine Learning ,Computer Science - Computer Vision and Pattern Recognition - Abstract
The performance of image-based Reinforcement Learning (RL) agents can vary depending on the position of the camera used to capture the images. Training on multiple cameras simultaneously, including a first-person egocentric camera, can leverage information from different camera perspectives to improve the performance of RL. However, hardware constraints may limit the availability of multiple cameras in real-world deployment. Additionally, cameras may become damaged in the real-world preventing access to all cameras that were used during training. To overcome these hardware constraints, we propose Multi-View Disentanglement (MVD), which uses multiple cameras to learn a policy that is robust to a reduction in the number of cameras to generalise to any single camera from the training set. Our approach is a self-supervised auxiliary task for RL that learns a disentangled representation from multiple cameras, with a shared representation that is aligned across all cameras to allow generalisation to a single camera, and a private representation that is camera-specific. We show experimentally that an RL agent trained on a single third-person camera is unable to learn an optimal policy in many control tasks; but, our approach, benefiting from multiple cameras during training, is able to solve the task using only the same single third-person camera., Comment: Reinforcement Learning Conference (RLC), 2024
- Published
- 2024
9. People Attribute Purpose to Autonomous Vehicles When Explaining Their Behavior: Insights from Cognitive Science for Explainable AI
- Author
-
Gyevnar, Balint, Droop, Stephanie, Quillien, Tadeg, Cohen, Shay B., Bramley, Neil R., Lucas, Christopher G., and Albrecht, Stefano V.
- Subjects
Computer Science - Human-Computer Interaction ,Computer Science - Artificial Intelligence ,Computer Science - Robotics - Abstract
It is often argued that effective human-centered explainable artificial intelligence (XAI) should resemble human reasoning. However, empirical investigations of how concepts from cognitive science can aid the design of XAI are lacking. Based on insights from cognitive science, we propose a framework of explanatory modes to analyze how people frame explanations, whether mechanistic, teleological, or counterfactual. Using the complex safety-critical domain of autonomous driving, we conduct an experiment consisting of two studies on (i) how people explain the behavior of a vehicle in 14 unique scenarios (N1=54) and (ii) how they perceive these explanations (N2=382), curating the novel Human Explanations for Autonomous Driving Decisions (HEADD) dataset. Our main finding is that participants deem teleological explanations significantly better quality than counterfactual ones, with perceived teleology being the best predictor of perceived quality. Based on our results, we argue that explanatory modes are an important axis of analysis when designing and evaluating XAI and highlight the need for a principled and empirically grounded understanding of the cognitive mechanisms of explanation. The HEADD dataset and our code are available at: https://datashare.ed.ac.uk/handle/10283/8930., Comment: CHI 2025
- Published
- 2024
- Full Text
- View/download PDF
10. Explainable AI for Safe and Trustworthy Autonomous Driving: A Systematic Review
- Author
-
Kuznietsov, Anton, Gyevnar, Balint, Wang, Cheng, Peters, Steven, and Albrecht, Stefano V.
- Subjects
Computer Science - Robotics ,Computer Science - Artificial Intelligence ,Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Human-Computer Interaction ,Computer Science - Machine Learning - Abstract
Artificial Intelligence (AI) shows promising applications for the perception and planning tasks in autonomous driving (AD) due to its superior performance compared to conventional methods. However, inscrutable AI systems exacerbate the existing challenge of safety assurance of AD. One way to mitigate this challenge is to utilize explainable AI (XAI) techniques. To this end, we present the first comprehensive systematic literature review of explainable methods for safe and trustworthy AD. We begin by analyzing the requirements for AI in the context of AD, focusing on three key aspects: data, model, and agency. We find that XAI is fundamental to meeting these requirements. Based on this, we explain the sources of explanations in AI and describe a taxonomy of XAI. We then identify five key contributions of XAI for safe and trustworthy AI in AD, which are interpretable design, interpretable surrogate models, interpretable monitoring, auxiliary explanations, and interpretable validation. Finally, we propose a modular framework called SafeX to integrate these contributions, enabling explanation delivery to users while simultaneously ensuring the safety of AI models.
- Published
- 2024
- Full Text
- View/download PDF
11. DRED: Zero-Shot Transfer in Reinforcement Learning via Data-Regularised Environment Design
- Author
-
Garcin, Samuel, Doran, James, Guo, Shangmin, Lucas, Christopher G., and Albrecht, Stefano V.
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence - Abstract
Autonomous agents trained using deep reinforcement learning (RL) often lack the ability to successfully generalise to new environments, even when these environments share characteristics with the ones they have encountered during training. In this work, we investigate how the sampling of individual environment instances, or levels, affects the zero-shot generalisation (ZSG) ability of RL agents. We discover that, for deep actor-critic architectures sharing their base layers, prioritising levels according to their value loss minimises the mutual information between the agent's internal representation and the set of training levels in the generated training data. This provides a novel theoretical justification for the regularisation achieved by certain adaptive sampling strategies. We then turn our attention to unsupervised environment design (UED) methods, which assume control over level generation. We find that existing UED methods can significantly shift the training distribution, which translates to low ZSG performance. To prevent both overfitting and distributional shift, we introduce data-regularised environment design (DRED). DRED generates levels using a generative model trained to approximate the ground truth distribution of an initial set of level parameters. Through its grounding, DRED achieves significant improvements in ZSG over adaptive level sampling strategies and UED methods. Our code and experimental data are available at https://github.com/uoe-agents/dred., Comment: To appear in ICML 2024. A preliminary version of this work (arXiv:2310.03494) was presented at the ALOE workshop, NeurIPS 2023. arXiv admin note: text overlap with arXiv:2310.03494
- Published
- 2024
12. lpNTK: Better Generalisation with Less Data via Sample Interaction During Learning
- Author
-
Guo, Shangmin, Ren, Yi, Albrecht, Stefano V., and Smith, Kenny
- Subjects
Computer Science - Machine Learning - Abstract
Although much research has been done on proposing new models or loss functions to improve the generalisation of artificial neural networks (ANNs), less attention has been directed to the impact of the training data on generalisation. In this work, we start from approximating the interaction between samples, i.e. how learning one sample would modify the model's prediction on other samples. Through analysing the terms involved in weight updates in supervised learning, we find that labels influence the interaction between samples. Therefore, we propose the labelled pseudo Neural Tangent Kernel (lpNTK) which takes label information into consideration when measuring the interactions between samples. We first prove that lpNTK asymptotically converges to the empirical neural tangent kernel in terms of the Frobenius norm under certain assumptions. Secondly, we illustrate how lpNTK helps to understand learning phenomena identified in previous work, specifically the learning difficulty of samples and forgetting events during learning. Moreover, we also show that using lpNTK to identify and remove poisoning training samples does not hurt the generalisation performance of ANNs., Comment: ICLR-2024
- Published
- 2024
13. Is Feedback All You Need? Leveraging Natural Language Feedback in Goal-Conditioned Reinforcement Learning
- Author
-
McCallum, Sabrina, Taylor-Davies, Max, Albrecht, Stefano V., and Suglia, Alessandro
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence - Abstract
Despite numerous successes, the field of reinforcement learning (RL) remains far from matching the impressive generalisation power of human behaviour learning. One possible way to help bridge this gap be to provide RL agents with richer, more human-like feedback expressed in natural language. To investigate this idea, we first extend BabyAI to automatically generate language feedback from the environment dynamics and goal condition success. Then, we modify the Decision Transformer architecture to take advantage of this additional signal. We find that training with language feedback either in place of or in addition to the return-to-go or goal descriptions improves agents' generalisation performance, and that agents can benefit from feedback even when this is only available during training, but not at inference., Comment: Accepted at Workshop on Goal-conditioned Reinforcement Learning, NeurIPS 2023
- Published
- 2023
14. Planning to Go Out-of-Distribution in Offline-to-Online Reinforcement Learning
- Author
-
McInroe, Trevor, Jelley, Adam, Albrecht, Stefano V., and Storkey, Amos
- Subjects
Computer Science - Machine Learning - Abstract
Offline pretraining with a static dataset followed by online fine-tuning (offline-to-online, or OtO) is a paradigm well matched to a real-world RL deployment process. In this scenario, we aim to find the best-performing policy within a limited budget of online interactions. Previous work in the OtO setting has focused on correcting for bias introduced by the policy-constraint mechanisms of offline RL algorithms. Such constraints keep the learned policy close to the behavior policy that collected the dataset, but we show this can unnecessarily limit policy performance if the behavior policy is far from optimal. Instead, we forgo constraints and frame OtO RL as an exploration problem that aims to maximize the benefit of online data-collection. We first study the major online RL exploration methods based on intrinsic rewards and UCB in the OtO setting, showing that intrinsic rewards add training instability through reward-function modification, and UCB methods are myopic and it is unclear which learned-component's ensemble to use for action selection. We then introduce an algorithm for planning to go out-of-distribution (PTGOOD) that avoids these issues. PTGOOD uses a non-myopic planning procedure that targets exploration in relatively high-reward regions of the state-action space unlikely to be visited by the behavior policy. By leveraging concepts from the Conditional Entropy Bottleneck, PTGOOD encourages data collected online to provide new information relevant to improving the final deployment policy without altering rewards. We show empirically in several continuous control tasks that PTGOOD significantly improves agent returns during online fine-tuning and avoids the suboptimal policy convergence that many of our baselines exhibit in several environments., Comment: 10 pages, 17 figures, published at RLC 2024
- Published
- 2023
15. How the level sampling process impacts zero-shot generalisation in deep reinforcement learning
- Author
-
Garcin, Samuel, Doran, James, Guo, Shangmin, Lucas, Christopher G., and Albrecht, Stefano V.
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence - Abstract
A key limitation preventing the wider adoption of autonomous agents trained via deep reinforcement learning (RL) is their limited ability to generalise to new environments, even when these share similar characteristics with environments encountered during training. In this work, we investigate how a non-uniform sampling strategy of individual environment instances, or levels, affects the zero-shot generalisation (ZSG) ability of RL agents, considering two failure modes: overfitting and over-generalisation. As a first step, we measure the mutual information (MI) between the agent's internal representation and the set of training levels, which we find to be well-correlated to instance overfitting. In contrast to uniform sampling, adaptive sampling strategies prioritising levels based on their value loss are more effective at maintaining lower MI, which provides a novel theoretical justification for this class of techniques. We then turn our attention to unsupervised environment design (UED) methods, which adaptively generate new training levels and minimise MI more effectively than methods sampling from a fixed set. However, we find UED methods significantly shift the training distribution, resulting in over-generalisation and worse ZSG performance over the distribution of interest. To prevent both instance overfitting and over-generalisation, we introduce self-supervised environment design (SSED). SSED generates levels using a variational autoencoder, effectively reducing MI while minimising the shift with the distribution of interest, and leads to statistically significant improvements in ZSG over fixed-set level sampling strategies and UED methods., Comment: Currently under review, 9 pages
- Published
- 2023
16. Contextual Pre-planning on Reward Machine Abstractions for Enhanced Transfer in Deep Reinforcement Learning
- Author
-
Azran, Guy, Danesh, Mohamad H., Albrecht, Stefano V., and Keren, Sarah
- Subjects
Computer Science - Artificial Intelligence ,Computer Science - Machine Learning - Abstract
Recent studies show that deep reinforcement learning (DRL) agents tend to overfit to the task on which they were trained and fail to adapt to minor environment changes. To expedite learning when transferring to unseen tasks, we propose a novel approach to representing the current task using reward machines (RMs), state machine abstractions that induce subtasks based on the current task's rewards and dynamics. Our method provides agents with symbolic representations of optimal transitions from their current abstract state and rewards them for achieving these transitions. These representations are shared across tasks, allowing agents to exploit knowledge of previously encountered symbols and transitions, thus enhancing transfer. Empirical results show that our representations improve sample efficiency and few-shot transfer in a variety of domains., Comment: Proceedings of the 38th AAAI Conference on Artificial Intelligence (AAAI), 2024
- Published
- 2023
17. Needs, rights and perspectives in the Birth Care Pathway during COVID-19 lockdown in Italy: the BiSogni Study, an exploratory qualitative research
- Author
-
Tambascia, G., Zambri, F., Sola, M. V., Marocco, S., Di Stefano, V., Marchetti, F., and Giusti, A.
- Published
- 2024
- Full Text
- View/download PDF
18. Conditional Mutual Information for Disentangled Representations in Reinforcement Learning
- Author
-
Dunion, Mhairi, McInroe, Trevor, Luck, Kevin Sebastian, Hanna, Josiah P., and Albrecht, Stefano V.
- Subjects
Computer Science - Machine Learning - Abstract
Reinforcement Learning (RL) environments can produce training data with spurious correlations between features due to the amount of training data or its limited feature coverage. This can lead to RL agents encoding these misleading correlations in their latent representation, preventing the agent from generalising if the correlation changes within the environment or when deployed in the real world. Disentangled representations can improve robustness, but existing disentanglement techniques that minimise mutual information between features require independent features, thus they cannot disentangle correlated features. We propose an auxiliary task for RL algorithms that learns a disentangled representation of high-dimensional observations with correlated features by minimising the conditional mutual information between features in the representation. We demonstrate experimentally, using continuous control tasks, that our approach improves generalisation under correlation shifts, as well as improving the training performance of RL algorithms in the presence of correlated features., Comment: Conference on Neural Information Processing Systems (NeurIPS), 2023
- Published
- 2023
19. SMAClite: A Lightweight Environment for Multi-Agent Reinforcement Learning
- Author
-
Michalski, Adam, Christianos, Filippos, and Albrecht, Stefano V.
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence ,Computer Science - Multiagent Systems - Abstract
There is a lack of standard benchmarks for Multi-Agent Reinforcement Learning (MARL) algorithms. The Starcraft Multi-Agent Challenge (SMAC) has been widely used in MARL research, but is built on top of a heavy, closed-source computer game, StarCraft II. Thus, SMAC is computationally expensive and requires knowledge and the use of proprietary tools specific to the game for any meaningful alteration or contribution to the environment. We introduce SMAClite -- a challenge based on SMAC that is both decoupled from Starcraft II and open-source, along with a framework which makes it possible to create new content for SMAClite without any special knowledge. We conduct experiments to show that SMAClite is equivalent to SMAC, by training MARL algorithms on SMAClite and reproducing SMAC results. We then show that SMAClite outperforms SMAC in both runtime speed and memory.
- Published
- 2023
20. Using Offline Data to Speed Up Reinforcement Learning in Procedurally Generated Environments
- Author
-
Andres, Alain, Schäfer, Lukas, Albrecht, Stefano V., and Del Ser, Javier
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence - Abstract
One of the key challenges of Reinforcement Learning (RL) is the ability of agents to generalise their learned policy to unseen settings. Moreover, training RL agents requires large numbers of interactions with the environment. Motivated by the recent success of Offline RL and Imitation Learning (IL), we conduct a study to investigate whether agents can leverage offline data in the form of trajectories to improve the sample-efficiency in procedurally generated environments. We consider two settings of using IL from offline data for RL: (1) pre-training a policy before online RL training and (2) concurrently training a policy with online RL and IL from offline data. We analyse the impact of the quality (optimality of trajectories) and diversity (number of trajectories and covered level) of available offline trajectories on the effectiveness of both approaches. Across four well-known sparse reward tasks in the MiniGrid environment, we find that using IL for pre-training and concurrently during online RL training both consistently improve the sample-efficiency while converging to optimal policies. Furthermore, we show that pre-training a policy from as few as two trajectories can make the difference between learning an optimal policy at the end of online training and not learning at all. Our findings motivate the widespread adoption of IL for pre-training and concurrent IL in procedurally generated environments whenever offline trajectories are available or can be generated., Comment: Initially presented at the Adaptive and Learning Agents Workshop (ALA) at the AAMAS conference 2023; the current extended version was accepted at Neurocomputing journal
- Published
- 2023
- Full Text
- View/download PDF
21. Revisiting the Gumbel-Softmax in MADDPG
- Author
-
Tilbury, Callum Rhys, Christianos, Filippos, and Albrecht, Stefano V.
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence ,Computer Science - Multiagent Systems ,Statistics - Machine Learning - Abstract
MADDPG is an algorithm in multi-agent reinforcement learning (MARL) that extends the popular single-agent method, DDPG, to multi-agent scenarios. Importantly, DDPG is an algorithm designed for continuous action spaces, where the gradient of the state-action value function exists. For this algorithm to work in discrete action spaces, discrete gradient estimation must be performed. For MADDPG, the Gumbel-Softmax (GS) estimator is used -- a reparameterisation which relaxes a discrete distribution into a similar continuous one. This method, however, is statistically biased, and a recent MARL benchmarking paper suggests that this bias makes MADDPG perform poorly in grid-world situations, where the action space is discrete. Fortunately, many alternatives to the GS exist, boasting a wide range of properties. This paper explores several of these alternatives and integrates them into MADDPG for discrete grid-world scenarios. The corresponding impact on various performance metrics is then measured and analysed. It is found that one of the proposed estimators performs significantly better than the original GS in several tasks, achieving up to 55% higher returns, along with faster convergence., Comment: Presented at AAMAS Workshop on Adaptive and Learning Agents, 2023
- Published
- 2023
22. Causal Explanations for Sequential Decision-Making in Multi-Agent Systems
- Author
-
Gyevnar, Balint, Wang, Cheng, Lucas, Christopher G., Cohen, Shay B., and Albrecht, Stefano V.
- Subjects
Computer Science - Artificial Intelligence ,Computer Science - Robotics ,I.2.9 - Abstract
We present CEMA: Causal Explanations in Multi-Agent systems; a framework for creating causal natural language explanations of an agent's decisions in dynamic sequential multi-agent systems to build more trustworthy autonomous agents. Unlike prior work that assumes a fixed causal structure, CEMA only requires a probabilistic model for forward-simulating the state of the system. Using such a model, CEMA simulates counterfactual worlds that identify the salient causes behind the agent's decisions. We evaluate CEMA on the task of motion planning for autonomous driving and test it in diverse simulated scenarios. We show that CEMA correctly and robustly identifies the causes behind the agent's decisions, even when a large number of other agents is present, and show via a user study that CEMA's explanations have a positive effect on participants' trust in autonomous vehicles and are rated as high as high-quality baseline explanations elicited from other participants. We release the collected explanations with annotations as the HEADD dataset., Comment: Accepted in 23rd International Conference on Autonomous Agents and Multi-Agent Systems (AAMAS), 2024
- Published
- 2023
23. Learning Complex Teamwork Tasks Using a Given Sub-task Decomposition
- Author
-
Fosong, Elliot, Rahman, Arrasy, Carlucho, Ignacio, and Albrecht, Stefano V.
- Subjects
Computer Science - Multiagent Systems ,Computer Science - Artificial Intelligence ,Computer Science - Machine Learning - Abstract
Training a team to complete a complex task via multi-agent reinforcement learning can be difficult due to challenges such as policy search in a large joint policy space, and non-stationarity caused by mutually adapting agents. To facilitate efficient learning of complex multi-agent tasks, we propose an approach which uses an expert-provided decomposition of a task into simpler multi-agent sub-tasks. In each sub-task, a subset of the entire team is trained to acquire sub-task-specific policies. The sub-teams are then merged and transferred to the target task, where their policies are collectively fine-tuned to solve the more complex target task. We show empirically that such approaches can greatly reduce the number of timesteps required to solve a complex target task relative to training from-scratch. However, we also identify and investigate two problems with naive implementations of approaches based on sub-task decomposition, and propose a simple and scalable method to address these problems which augments existing actor-critic algorithms. We demonstrate the empirical benefits of our proposed method, enabling sub-task decomposition approaches to be deployed in diverse multi-agent tasks.
- Published
- 2023
24. Ensemble Value Functions for Efficient Exploration in Multi-Agent Reinforcement Learning
- Author
-
Schäfer, Lukas, Slumbers, Oliver, McAleer, Stephen, Du, Yali, Albrecht, Stefano V., and Mguni, David
- Subjects
Computer Science - Multiagent Systems ,Computer Science - Machine Learning - Abstract
Multi-agent reinforcement learning (MARL) requires agents to explore within a vast joint action space to find joint actions that lead to coordination. Existing value-based MARL algorithms commonly rely on random exploration, such as $\epsilon$-greedy, to explore the environment which is not systematic and inefficient at identifying effective actions in multi-agent problems. Additionally, the concurrent training of the policies of multiple agents during training can render the optimisation non-stationary. This can lead to unstable value estimates, highly variant gradients, and ultimately hinder coordination between agents. To address these challenges, we propose ensemble value functions for multi-agent exploration (EMAX). EMAX is a framework to seamlessly extend value-based MARL algorithms. EMAX leverages an ensemble of value functions for each agent to guide their exploration, reduce the variance of their optimisation, and makes their policies more robust to miscoordination. EMAX achieves these benefits by (1) systematically guiding the exploration of agents with a UCB policy towards parts of the environment that require multiple agents to coordinate. (2) EMAX computes average value estimates across the ensemble as target values to reduce the variance of gradients and make optimisation more stable. (3) During evaluation, EMAX selects actions following a majority vote across the ensemble to reduce the likelihood of miscoordination. We first instantiate independent DQN with EMAX and evaluate it in 11 general-sum tasks with sparse rewards. We show that EMAX improves final evaluation returns by 185% across all tasks. We then evaluate EMAX on top of IDQN, VDN and QMIX in 21 common-reward tasks, and show that EMAX improves sample efficiency and final evaluation returns across all tasks over all three vanilla algorithms by 60%, 47%, and 538%, respectively., Comment: Published at 24th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2025)
- Published
- 2023
25. Scalable Multi-Agent Reinforcement Learning for Warehouse Logistics with Robotic and Human Co-Workers
- Author
-
Krnjaic, Aleksandar, Steleac, Raul D., Thomas, Jonathan D., Papoudakis, Georgios, Schäfer, Lukas, To, Andrew Wing Keung, Lao, Kuan-Ho, Cubuktepe, Murat, Haley, Matthew, Börsting, Peter, and Albrecht, Stefano V.
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence ,Computer Science - Multiagent Systems ,Computer Science - Robotics - Abstract
We consider a warehouse in which dozens of mobile robots and human pickers work together to collect and deliver items within the warehouse. The fundamental problem we tackle, called the order-picking problem, is how these worker agents must coordinate their movement and actions in the warehouse to maximise performance in this task. Established industry methods using heuristic approaches require large engineering efforts to optimise for innately variable warehouse configurations. In contrast, multi-agent reinforcement learning (MARL) can be flexibly applied to diverse warehouse configurations (e.g. size, layout, number/types of workers, item replenishment frequency), and different types of order-picking paradigms (e.g. Goods-to-Person and Person-to-Goods), as the agents can learn how to cooperate optimally through experience. We develop hierarchical MARL algorithms in which a manager agent assigns goals to worker agents, and the policies of the manager and workers are co-trained toward maximising a global objective (e.g. pick rate). Our hierarchical algorithms achieve significant gains in sample efficiency over baseline MARL algorithms and overall pick rates over multiple established industry heuristics in a diverse set of warehouse configurations and different order-picking paradigms., Comment: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2024
- Published
- 2022
26. Planning with Occluded Traffic Agents using Bi-Level Variational Occlusion Models
- Author
-
Christianos, Filippos, Karkus, Peter, Ivanovic, Boris, Albrecht, Stefano V., and Pavone, Marco
- Subjects
Computer Science - Machine Learning ,Computer Science - Robotics - Abstract
Reasoning with occluded traffic agents is a significant open challenge for planning for autonomous vehicles. Recent deep learning models have shown impressive results for predicting occluded agents based on the behaviour of nearby visible agents; however, as we show in experiments, these models are difficult to integrate into downstream planning. To this end, we propose Bi-level Variational Occlusion Models (BiVO), a two-step generative model that first predicts likely locations of occluded agents, and then generates likely trajectories for the occluded agents. In contrast to existing methods, BiVO outputs a trajectory distribution which can then be sampled from and integrated into standard downstream planning. We evaluate the method in closed-loop replay simulation using the real-world nuScenes dataset. Our results suggest that BiVO can successfully learn to predict occluded agent trajectories, and these predictions lead to better subsequent motion plans in critical scenarios., Comment: 7 pages, 6 figures
- Published
- 2022
27. DiPA: Probabilistic Multi-Modal Interactive Prediction for Autonomous Driving
- Author
-
Knittel, Anthony, Hawasly, Majd, Albrecht, Stefano V., Redford, John, and Ramamoorthy, Subramanian
- Subjects
Computer Science - Robotics - Abstract
Accurate prediction is important for operating an autonomous vehicle in interactive scenarios. Prediction must be fast, to support multiple requests from a planner exploring a range of possible futures. The generated predictions must accurately represent the probabilities of predicted trajectories, while also capturing different modes of behaviour (such as turning left vs continuing straight at a junction). To this end, we present DiPA, an interactive predictor that addresses these challenging requirements. Previous interactive prediction methods use an encoding of k-mode-samples, which under-represents the full distribution. Other methods optimise closest-mode evaluations, which test whether one of the predictions is similar to the ground-truth, but allow additional unlikely predictions to occur, over-representing unlikely predictions. DiPA addresses these limitations by using a Gaussian-Mixture-Model to encode the full distribution, and optimising predictions using both probabilistic and closest-mode measures. These objectives respectively optimise probabilistic accuracy and the ability to capture distinct behaviours, and there is a challenging trade-off between them. We are able to solve both together using a novel training regime. DiPA achieves new state-of-the-art performance on the INTERACTION and NGSIM datasets, and improves over the baseline (MFP) when both closest-mode and probabilistic evaluations are used. This demonstrates effective prediction for supporting a planner on interactive scenarios.
- Published
- 2022
- Full Text
- View/download PDF
28. A General Learning Framework for Open Ad Hoc Teamwork Using Graph-based Policy Learning
- Author
-
Rahman, Arrasy, Carlucho, Ignacio, Höpner, Niklas, and Albrecht, Stefano V.
- Subjects
Computer Science - Multiagent Systems ,Computer Science - Artificial Intelligence - Abstract
Open ad hoc teamwork is the problem of training a single agent to efficiently collaborate with an unknown group of teammates whose composition may change over time. A variable team composition creates challenges for the agent, such as the requirement to adapt to new team dynamics and dealing with changing state vector sizes. These challenges are aggravated in real-world applications in which the controlled agent only has a partial view of the environment. In this work, we develop a class of solutions for open ad hoc teamwork under full and partial observability. We start by developing a solution for the fully observable case that leverages graph neural network architectures to obtain an optimal policy based on reinforcement learning. We then extend this solution to partially observable scenarios by proposing different methodologies that maintain belief estimates over the latent environment states and team composition. These belief estimates are combined with our solution for the fully observable case to compute an agent's optimal policy under partial observability in open ad hoc teamwork. Empirical results demonstrate that our solution can learn efficient policies in open ad hoc teamwork in fully and partially observable cases. Further analysis demonstrates that our methods' success is a result of effectively learning the effects of teammates' actions while also inferring the inherent state of the environment under partial observability.
- Published
- 2022
29. Pareto Actor-Critic for Equilibrium Selection in Multi-Agent Reinforcement Learning
- Author
-
Christianos, Filippos, Papoudakis, Georgios, and Albrecht, Stefano V.
- Subjects
Computer Science - Machine Learning ,Computer Science - Multiagent Systems - Abstract
This work focuses on equilibrium selection in no-conflict multi-agent games, where we specifically study the problem of selecting a Pareto-optimal Nash equilibrium among several existing equilibria. It has been shown that many state-of-the-art multi-agent reinforcement learning (MARL) algorithms are prone to converging to Pareto-dominated equilibria due to the uncertainty each agent has about the policy of the other agents during training. To address sub-optimal equilibrium selection, we propose Pareto Actor-Critic (Pareto-AC), which is an actor-critic algorithm that utilises a simple property of no-conflict games (a superset of cooperative games): the Pareto-optimal equilibrium in a no-conflict game maximises the returns of all agents and, therefore, is the preferred outcome for all agents. We evaluate Pareto-AC in a diverse set of multi-agent games and show that it converges to higher episodic returns compared to seven state-of-the-art MARL algorithms and that it successfully converges to a Pareto-optimal equilibrium in a range of matrix games. Finally, we propose PACDCG, a graph neural network extension of Pareto-AC, which is shown to efficiently scale in games with a large number of agents., Comment: Published in Transactions on Machine Learning Research (TMLR); Reviewed on OpenReview: https://openreview.net/forum?id=3AzqYa18ah
- Published
- 2022
30. Deep Reinforcement Learning for Multi-Agent Interaction
- Author
-
Ahmed, Ibrahim H., Brewitt, Cillian, Carlucho, Ignacio, Christianos, Filippos, Dunion, Mhairi, Fosong, Elliot, Garcin, Samuel, Guo, Shangmin, Gyevnar, Balint, McInroe, Trevor, Papoudakis, Georgios, Rahman, Arrasy, Schäfer, Lukas, Tamborski, Massimiliano, Vecchio, Giuseppe, Wang, Cheng, and Albrecht, Stefano V.
- Subjects
Computer Science - Multiagent Systems ,Computer Science - Artificial Intelligence ,Computer Science - Machine Learning - Abstract
The development of autonomous agents which can interact with other agents to accomplish a given task is a core area of research in artificial intelligence and machine learning. Towards this goal, the Autonomous Agents Research Group develops novel machine learning algorithms for autonomous systems control, with a specific focus on deep reinforcement learning and multi-agent reinforcement learning. Research problems include scalable learning of coordinated agent policies and inter-agent communication; reasoning about the behaviours, goals, and composition of other agents from limited observations; and sample-efficient learning based on intrinsic motivation, curriculum learning, causal inference, and representation learning. This article provides a broad overview of the ongoing research portfolio of the group and discusses open problems for future directions., Comment: Published in AI Communications Special Issue on Multi-Agent Systems Research in the UK
- Published
- 2022
31. Perspectives on the System-level Design of a Safe Autonomous Driving Stack
- Author
-
Hawasly, Majd, Sadeghi, Jonathan, Antonello, Morris, Albrecht, Stefano V., Redford, John, and Ramamoorthy, Subramanian
- Subjects
Computer Science - Robotics ,Computer Science - Multiagent Systems - Abstract
Achieving safe and robust autonomy is the key bottleneck on the path towards broader adoption of autonomous vehicles technology. This motivates going beyond extrinsic metrics such as miles between disengagement, and calls for approaches that embody safety by design. In this paper, we address some aspects of this challenge, with emphasis on issues of motion planning and prediction. We do this through description of novel approaches taken to solving selected sub-problems within an autonomous driving stack, in the process introducing the design philosophy being adopted within Five. This includes safe-by-design planning, interpretable as well as verifiable prediction, and modelling of perception errors to enable effective sim-to-real and real-to-sim transfer within the testing pipeline of a realistic autonomous system., Comment: AI Communications special issue on Multi-agent Systems Research in the UK
- Published
- 2022
32. Generating Teammates for Training Robust Ad Hoc Teamwork Agents via Best-Response Diversity
- Author
-
Rahman, Arrasy, Fosong, Elliot, Carlucho, Ignacio, and Albrecht, Stefano V.
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence - Abstract
Ad hoc teamwork (AHT) is the challenge of designing a robust learner agent that effectively collaborates with unknown teammates without prior coordination mechanisms. Early approaches address the AHT challenge by training the learner with a diverse set of handcrafted teammate policies, usually designed based on an expert's domain knowledge about the policies the learner may encounter. However, implementing teammate policies for training based on domain knowledge is not always feasible. In such cases, recent approaches attempted to improve the robustness of the learner by training it with teammate policies generated by optimising information-theoretic diversity metrics. The problem with optimising existing information-theoretic diversity metrics for teammate policy generation is the emergence of superficially different teammates. When used for AHT training, superficially different teammate behaviours may not improve a learner's robustness during collaboration with unknown teammates. In this paper, we present an automated teammate policy generation method optimising the Best-Response Diversity (BRDiv) metric, which measures diversity based on the compatibility of teammate policies in terms of returns. We evaluate our approach in environments with multiple valid coordination strategies, comparing against methods optimising information-theoretic diversity metrics and an ablation not optimising any diversity metric. Our experiments indicate that optimising BRDiv yields a diverse set of training teammate policies that improve the learner's performance relative to previous teammate generation approaches when collaborating with near-optimal previously unseen teammate policies., Comment: Accepted in Transactions of Machine Learning Research
- Published
- 2022
33. Few-Shot Teamwork
- Author
-
Fosong, Elliot, Rahman, Arrasy, Carlucho, Ignacio, and Albrecht, Stefano V.
- Subjects
Computer Science - Multiagent Systems ,Computer Science - Artificial Intelligence - Abstract
We propose the novel few-shot teamwork (FST) problem, where skilled agents trained in a team to complete one task are combined with skilled agents from different tasks, and together must learn to adapt to an unseen but related task. We discuss how the FST problem can be seen as addressing two separate problems: one of reducing the experience required to train a team of agents to complete a complex task; and one of collaborating with unfamiliar teammates to complete a new task. Progress towards solving FST could lead to progress in both multi-agent reinforcement learning and ad hoc teamwork., Comment: IJCAI Workshop on Ad Hoc Teamwork, 2022
- Published
- 2022
34. Cooperative Marine Operations via Ad Hoc Teams
- Author
-
Carlucho, Ignacio, Rahman, Arrasy, Ard, William, Fosong, Elliot, Barbalata, Corina, and Albrecht, Stefano V.
- Subjects
Computer Science - Multiagent Systems - Abstract
While research in ad hoc teamwork has great potential for solving real-world robotic applications, most developments so far have been focusing on environments with simple dynamics. In this article, we discuss how the problem of ad hoc teamwork can be of special interest for marine robotics and how it can aid marine operations. Particularly, we present a set of challenges that need to be addressed for achieving ad hoc teamwork in underwater environments and we discuss possible solutions based on current state-of-the-art developments in the ad hoc teamwork literature.
- Published
- 2022
35. Temporal Disentanglement of Representations for Improved Generalisation in Reinforcement Learning
- Author
-
Dunion, Mhairi, McInroe, Trevor, Luck, Kevin Sebastian, Hanna, Josiah P., and Albrecht, Stefano V.
- Subjects
Computer Science - Machine Learning - Abstract
Reinforcement Learning (RL) agents are often unable to generalise well to environment variations in the state space that were not observed during training. This issue is especially problematic for image-based RL, where a change in just one variable, such as the background colour, can change many pixels in the image. The changed pixels can lead to drastic changes in the agent's latent representation of the image, causing the learned policy to fail. To learn more robust representations, we introduce TEmporal Disentanglement (TED), a self-supervised auxiliary task that leads to disentangled image representations exploiting the sequential nature of RL observations. We find empirically that RL algorithms utilising TED as an auxiliary task adapt more quickly to changes in environment variables with continued training compared to state-of-the-art representation learning methods. Since TED enforces a disentangled structure of the representation, our experiments also show that policies trained with TED generalise better to unseen values of variables irrelevant to the task (e.g. background colour) as well as unseen values of variables that affect the optimal policy (e.g. goal positions)., Comment: International Conference on Learning Representations (ICLR), 2023
- Published
- 2022
36. Learning Task Embeddings for Teamwork Adaptation in Multi-Agent Reinforcement Learning
- Author
-
Schäfer, Lukas, Christianos, Filippos, Storkey, Amos, and Albrecht, Stefano V.
- Subjects
Computer Science - Multiagent Systems ,Computer Science - Artificial Intelligence ,Computer Science - Machine Learning - Abstract
Successful deployment of multi-agent reinforcement learning often requires agents to adapt their behaviour. In this work, we discuss the problem of teamwork adaptation in which a team of agents needs to adapt their policies to solve novel tasks with limited fine-tuning. Motivated by the intuition that agents need to be able to identify and distinguish tasks in order to adapt their behaviour to the current task, we propose to learn multi-agent task embeddings (MATE). These task embeddings are trained using an encoder-decoder architecture optimised for reconstruction of the transition and reward functions which uniquely identify tasks. We show that a team of agents is able to adapt to novel tasks when provided with task embeddings. We propose three MATE training paradigms: independent MATE, centralised MATE, and mixed MATE which vary in the information used for the task encoding. We show that the embeddings learned by MATE identify tasks and provide useful information which agents leverage during adaptation to novel tasks., Comment: To be presented at the Seventh Workshop on Generalization in Planning at the NeurIPS 2023 conference
- Published
- 2022
37. Verifiable Goal Recognition for Autonomous Driving with Occlusions
- Author
-
Brewitt, Cillian, Tamborski, Massimiliano, Wang, Cheng, and Albrecht, Stefano V.
- Subjects
Computer Science - Robotics ,Computer Science - Machine Learning - Abstract
Goal recognition (GR) involves inferring the goals of other vehicles, such as a certain junction exit, which can enable more accurate prediction of their future behaviour. In autonomous driving, vehicles can encounter many different scenarios and the environment may be partially observable due to occlusions. We present a novel GR method named Goal Recognition with Interpretable Trees under Occlusion (OGRIT). OGRIT uses decision trees learned from vehicle trajectory data to infer the probabilities of a set of generated goals. We demonstrate that OGRIT can handle missing data due to occlusions and make inferences across multiple scenarios using the same learned decision trees, while being computationally fast, accurate, interpretable and verifiable. We also release the inDO, rounDO and OpenDDO datasets of occluded regions used to evaluate OGRIT., Comment: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2023
- Published
- 2022
38. Multi-Horizon Representations with Hierarchical Forward Models for Reinforcement Learning
- Author
-
McInroe, Trevor, Schäfer, Lukas, and Albrecht, Stefano V.
- Subjects
Computer Science - Machine Learning - Abstract
Learning control from pixels is difficult for reinforcement learning (RL) agents because representation learning and policy learning are intertwined. Previous approaches remedy this issue with auxiliary representation learning tasks, but they either do not consider the temporal aspect of the problem or only consider single-step transitions, which may cause learning inefficiencies if important environmental changes take many steps to manifest. We propose Hierarchical $k$-Step Latent (HKSL), an auxiliary task that learns multiple representations via a hierarchy of forward models that learn to communicate and an ensemble of $n$-step critics that all operate at varying magnitudes of step skipping. We evaluate HKSL in a suite of 30 robotic control tasks with and without distractors and a task of our creation. We find that HKSL either converges to higher or optimal episodic returns more quickly than several alternative representation learning approaches. Furthermore, we find that HKSL's representations capture task-relevant details accurately across timescales (even in the presence of distractors) and that communication channels between hierarchy levels organize information based on both sides of the communication process, both of which improve sample efficiency., Comment: Published in TMLR
- Published
- 2022
39. A Human-Centric Method for Generating Causal Explanations in Natural Language for Autonomous Vehicle Motion Planning
- Author
-
Gyevnar, Balint, Tamborski, Massimiliano, Wang, Cheng, Lucas, Christopher G., Cohen, Shay B., and Albrecht, Stefano V.
- Subjects
Computer Science - Robotics - Abstract
Inscrutable AI systems are difficult to trust, especially if they operate in safety-critical settings like autonomous driving. Therefore, there is a need to build transparent and queryable systems to increase trust levels. We propose a transparent, human-centric explanation generation method for autonomous vehicle motion planning and prediction based on an existing white-box system called IGP2. Our method integrates Bayesian networks with context-free generative rules and can give causal natural language explanations for the high-level driving behaviour of autonomous vehicles. Preliminary testing on simulated scenarios shows that our method captures the causes behind the actions of autonomous vehicles and generates intelligible explanations with varying complexity., Comment: IJCAI Workshop on Artificial Intelligence for Autonomous Driving (AI4AD), 2022
- Published
- 2022
- Full Text
- View/download PDF
40. MIDGARD: A Simulation Platform for Autonomous Navigation in Unstructured Environments
- Author
-
Vecchio, Giuseppe, Palazzo, Simone, Guastella, Dario C., Carlucho, Ignacio, Albrecht, Stefano V., Muscato, Giovanni, and Spampinato, Concetto
- Subjects
Computer Science - Robotics - Abstract
We present MIDGARD, an open-source simulation platform for autonomous robot navigation in outdoor unstructured environments. MIDGARD is designed to enable the training of autonomous agents (e.g., unmanned ground vehicles) in photorealistic 3D environments, and to support the generalization skills of learning-based agents through the variability in training scenarios. MIDGARD's main features include a configurable, extensible, and difficulty-driven procedural landscape generation pipeline, with fast and photorealistic scene rendering based on Unreal Engine. Additionally, MIDGARD has built-in support for OpenAI Gym, a programming interface for feature extension (e.g., integrating new types of sensors, customizing exposing internal simulation variables), and a variety of simulated agent sensors (e.g., RGB, depth and instance/semantic segmentation). We evaluate MIDGARD's capabilities as a benchmarking tool for robot navigation utilizing a set of state-of-the-art reinforcement learning algorithms. The results demonstrate MIDGARD's suitability as a simulation and training environment, as well as the effectiveness of our procedural generation approach in controlling scene difficulty, which directly reflects on accuracy metrics. MIDGARD build, source code and documentation are available at https://midgardsim.org/.
- Published
- 2022
41. Flash: Fast and Light Motion Prediction for Autonomous Driving with Bayesian Inverse Planning and Learned Motion Profiles
- Author
-
Antonello, Morris, Dobre, Mihai, Albrecht, Stefano V., Redford, John, and Ramamoorthy, Subramanian
- Subjects
Computer Science - Robotics - Abstract
Motion prediction of road users in traffic scenes is critical for autonomous driving systems that must take safe and robust decisions in complex dynamic environments. We present a novel motion prediction system for autonomous driving. Our system is based on the Bayesian inverse planning framework, which efficiently orchestrates map-based goal extraction, a classical control-based trajectory generator and a mixture of experts collection of light-weight neural networks specialised in motion profile prediction. In contrast to many alternative methods, this modularity helps isolate performance factors and better interpret results, without compromising performance. This system addresses multiple aspects of interest, namely multi-modality, motion profile uncertainty and trajectory physical feasibility. We report on several experiments with the popular highway dataset NGSIM, demonstrating state-of-the-art performance in terms of trajectory error. We also perform a detailed analysis of our system's components, along with experiments that stratify the data based on behaviours, such as change-lane versus follow-lane, to provide insights into the challenges in this domain. Finally, we present a qualitative analysis to show other benefits of our approach, such as the ability to interpret the outputs., Comment: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2022. 8 pages
- Published
- 2022
42. A Survey of Ad Hoc Teamwork Research
- Author
-
Mirsky, Reuth, Carlucho, Ignacio, Rahman, Arrasy, Fosong, Elliot, Macke, William, Sridharan, Mohan, Stone, Peter, and Albrecht, Stefano V.
- Subjects
Computer Science - Multiagent Systems ,Computer Science - Artificial Intelligence - Abstract
Ad hoc teamwork is the research problem of designing agents that can collaborate with new teammates without prior coordination. This survey makes a two-fold contribution: First, it provides a structured description of the different facets of the ad hoc teamwork problem. Second, it discusses the progress that has been made in the field so far, and identifies the immediate and long-term open problems that need to be addressed in ad hoc teamwork., Comment: European Conference on Multi-Agent Systems (EUMAS), 2022
- Published
- 2022
43. Robust On-Policy Sampling for Data-Efficient Policy Evaluation in Reinforcement Learning
- Author
-
Zhong, Rujie, Zhang, Duohan, Schäfer, Lukas, Albrecht, Stefano V., and Hanna, Josiah P.
- Subjects
Computer Science - Machine Learning - Abstract
Reinforcement learning (RL) algorithms are often categorized as either on-policy or off-policy depending on whether they use data from a target policy of interest or from a different behavior policy. In this paper, we study a subtle distinction between on-policy data and on-policy sampling in the context of the RL sub-problem of policy evaluation. We observe that on-policy sampling may fail to match the expected distribution of on-policy data after observing only a finite number of trajectories and this failure hinders data-efficient policy evaluation. Towards improved data-efficiency, we show how non-i.i.d., off-policy sampling can produce data that more closely matches the expected on-policy data distribution and consequently increases the accuracy of the Monte Carlo estimator for policy evaluation. We introduce a method called Robust On-Policy Sampling and demonstrate theoretically and empirically that it produces data that converges faster to the expected on-policy distribution compared to on-policy sampling. Empirically, we show that this faster convergence leads to lower mean squared error policy value estimates., Comment: Published in 36th Conference on Neural Information Processing Systems (NeurIPS 2022)
- Published
- 2021
44. Learning Temporally-Consistent Representations for Data-Efficient Reinforcement Learning
- Author
-
McInroe, Trevor, Schäfer, Lukas, and Albrecht, Stefano V.
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence - Abstract
Deep reinforcement learning (RL) agents that exist in high-dimensional state spaces, such as those composed of images, have interconnected learning burdens. Agents must learn an action-selection policy that completes their given task, which requires them to learn a representation of the state space that discerns between useful and useless information. The reward function is the only supervised feedback that RL agents receive, which causes a representation learning bottleneck that can manifest in poor sample efficiency. We present $k$-Step Latent (KSL), a new representation learning method that enforces temporal consistency of representations via a self-supervised auxiliary task wherein agents learn to recurrently predict action-conditioned representations of the state space. The state encoder learned by KSL produces low-dimensional representations that make optimization of the RL task more sample efficient. Altogether, KSL produces state-of-the-art results in both data efficiency and asymptotic performance in the popular PlaNet benchmark suite. Our analyses show that KSL produces encoders that generalize better to new tasks unseen during training, and its representations are more strongly tied to reward, are more invariant to perturbations in the state space, and move more smoothly through the temporal axis of the RL problem than other methods such as DrQ, RAD, CURL, and SAC-AE.
- Published
- 2021
45. Hydrogen reionisation ends by $z=5.3$: Lyman-$\alpha$ optical depth measured by the XQR-30 sample
- Author
-
Bosman, Sarah E. I., Davies, Frederick B., Becker, George D., Keating, Laura C., Davies, Rebecca L., Zhu, Yongda, Eilers, Anna-Christina, D'Odorico, Valentina, Bian, Fuyan, Bischetti, Manuela, Cristiani, Stefano V., Fan, Xiaohui, Farina, Emanuele P., Haehnelt, Martin G., Hennawi, Joseph F., Kulkarni, Girish, Mesinger, Andrei, Meyer, Romain A., Onoue, Masafusa, Pallottini, Andrea, Qin, Yuxiang, Ryan-Weber, Emma, Schindler, Jan-Torge, Walter, Fabian, Wang, Feige, and Yang, Jinyi
- Subjects
Astrophysics - Cosmology and Nongalactic Astrophysics - Abstract
The presence of excess scatter in the Ly-$\alpha$ forest at $z\sim 5.5$, together with the existence of sporadic extended opaque Gunn-Peterson troughs, has started to provide robust evidence for a late end of hydrogen reionisation. However, low data quality and systematic uncertainties complicate the use of Ly-$\alpha$ transmission as a precision probe of reionisation's end stages. In this paper, we assemble a sample of 67 quasar sightlines at $z>5.5$ with high signal-to-noise ratios of $>10$ per $\leq 15$ km s$^{-1}$ spectral pixel, relying largely on the new XQR-30 quasar sample. XQR-30 is a large program on VLT/X-Shooter which obtained deep (SNR $>20$ per pixel) spectra of 30 quasars at $z>5.7$. We carefully account for systematics in continuum reconstruction, instrumentation, and contamination by damped Ly-$\alpha$ systems. We present improved measurements of the mean Ly-$\alpha$ transmission over $4.9
3.5 \sigma$). Our results indicate that reionisation-related fluctuations, whether in the UVB, residual neutral hydrogen fraction, and/or IGM temperature, persist in the intergalactic medium until at least $z=5.3$ ($t=1.1$ Gyr after the Big Bang). This is further evidence for a late end to reionisation., Comment: Accepted for publication in MNRAS. Full measurement datasets are available on the first author's website - Published
- 2021
- Full Text
- View/download PDF
46. Interpretable Goal Recognition in the Presence of Occluded Factors for Autonomous Vehicles
- Author
-
Hanna, Josiah P., Rahman, Arrasy, Fosong, Elliot, Eiras, Francisco, Dobre, Mihai, Redford, John, Ramamoorthy, Subramanian, and Albrecht, Stefano V.
- Subjects
Computer Science - Robotics - Abstract
Recognising the goals or intentions of observed vehicles is a key step towards predicting the long-term future behaviour of other agents in an autonomous driving scenario. When there are unseen obstacles or occluded vehicles in a scenario, goal recognition may be confounded by the effects of these unseen entities on the behaviour of observed vehicles. Existing prediction algorithms that assume rational behaviour with respect to inferred goals may fail to make accurate long-horizon predictions because they ignore the possibility that the behaviour is influenced by such unseen entities. We introduce the Goal and Occluded Factor Inference (GOFI) algorithm which bases inference on inverse-planning to jointly infer a probabilistic belief over goals and potential occluded factors. We then show how these beliefs can be integrated into Monte Carlo Tree Search (MCTS). We demonstrate that jointly inferring goals and occluded factors leads to more accurate beliefs with respect to the true world state and allows an agent to safely navigate several scenarios where other baselines take unsafe actions leading to collisions., Comment: 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2021)
- Published
- 2021
47. Decoupled Reinforcement Learning to Stabilise Intrinsically-Motivated Exploration
- Author
-
Schäfer, Lukas, Christianos, Filippos, Hanna, Josiah P., and Albrecht, Stefano V.
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence - Abstract
Intrinsic rewards can improve exploration in reinforcement learning, but the exploration process may suffer from instability caused by non-stationary reward shaping and strong dependency on hyperparameters. In this work, we introduce Decoupled RL (DeRL) as a general framework which trains separate policies for intrinsically-motivated exploration and exploitation. Such decoupling allows DeRL to leverage the benefits of intrinsic rewards for exploration while demonstrating improved robustness and sample efficiency. We evaluate DeRL algorithms in two sparse-reward environments with multiple types of intrinsic rewards. Our results show that DeRL is more robust to varying scale and rate of decay of intrinsic rewards and converges to the same evaluation returns than intrinsically-motivated baselines in fewer interactions. Lastly, we discuss the challenge of distribution shift and show that divergence constraint regularisers can successfully minimise instability caused by divergence of exploration and exploitation policies., Comment: Published at the International Conference on Autonomous Agents and Multi-Agent Systems (AAMAS) 2022
- Published
- 2021
48. Expressivity of Emergent Language is a Trade-off between Contextual Complexity and Unpredictability
- Author
-
Guo, Shangmin, Ren, Yi, Mathewson, Kory, Kirby, Simon, Albrecht, Stefano V., and Smith, Kenny
- Subjects
Computer Science - Computation and Language - Abstract
Researchers are using deep learning models to explore the emergence of language in various language games, where agents interact and develop an emergent language to solve tasks. We focus on the factors that determine the expressivity of emergent languages, which reflects the amount of information about input spaces those languages are capable of encoding. We measure the expressivity of emergent languages based on the generalisation performance across different games, and demonstrate that the expressivity of emergent languages is a trade-off between the complexity and unpredictability of the context those languages emerged from. Another contribution of this work is the discovery of message type collapse, i.e. the number of unique messages is lower than that of inputs. We also show that using the contrastive loss proposed by Chen et al. (2020) can alleviate this problem., Comment: 22 pages, 12 figures, 5 tables
- Published
- 2021
49. GRIT: Fast, Interpretable, and Verifiable Goal Recognition with Learned Decision Trees for Autonomous Driving
- Author
-
Brewitt, Cillian, Gyevnar, Balint, Garcin, Samuel, and Albrecht, Stefano V.
- Subjects
Computer Science - Robotics ,Computer Science - Multiagent Systems - Abstract
It is important for autonomous vehicles to have the ability to infer the goals of other vehicles (goal recognition), in order to safely interact with other vehicles and predict their future trajectories. This is a difficult problem, especially in urban environments with interactions between many vehicles. Goal recognition methods must be fast to run in real time and make accurate inferences. As autonomous driving is safety-critical, it is important to have methods which are human interpretable and for which safety can be formally verified. Existing goal recognition methods for autonomous vehicles fail to satisfy all four objectives of being fast, accurate, interpretable and verifiable. We propose Goal Recognition with Interpretable Trees (GRIT), a goal recognition system which achieves these objectives. GRIT makes use of decision trees trained on vehicle trajectory data. We evaluate GRIT on two datasets, showing that GRIT achieved fast inference speed and comparable accuracy to two deep learning baselines, a planning-based goal recognition method, and an ablation of GRIT. We show that the learned trees are human interpretable and demonstrate how properties of GRIT can be formally verified using a satisfiability modulo theories (SMT) solver., Comment: 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
- Published
- 2021
50. Scaling Multi-Agent Reinforcement Learning with Selective Parameter Sharing
- Author
-
Christianos, Filippos, Papoudakis, Georgios, Rahman, Arrasy, and Albrecht, Stefano V.
- Subjects
Computer Science - Multiagent Systems ,Computer Science - Machine Learning - Abstract
Sharing parameters in multi-agent deep reinforcement learning has played an essential role in allowing algorithms to scale to a large number of agents. Parameter sharing between agents significantly decreases the number of trainable parameters, shortening training times to tractable levels, and has been linked to more efficient learning. However, having all agents share the same parameters can also have a detrimental effect on learning. We demonstrate the impact of parameter sharing methods on training speed and converged returns, establishing that when applied indiscriminately, their effectiveness is highly dependent on the environment. We propose a novel method to automatically identify agents which may benefit from sharing parameters by partitioning them based on their abilities and goals. Our approach combines the increased sample efficiency of parameter sharing with the representational capacity of multiple independent networks to reduce training time and increase final returns., Comment: To be published In Proceedings of the 38th International Conference on Machine Learning (ICML), 2021
- Published
- 2021
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.