293 results on '"multi-agent learning"'
Search Results
2. Acquisition of Cooperative Control of Multiple Vehicles Through Reinforcement Learning Utilizing Vehicle-to-Vehicle Communication and Map Information.
- Author
-
Suzuki, Tenta, Matsuda, Kenji, Kumagae, Kaito, Tobisawa, Mao, Hoshino, Junya, Itoh, Yuki, Harada, Tomohiro, Matsuoka, Jyouhei, Kagawa, Toshinori, and Hattori, Kiyohiko
- Subjects
- *
MAPS , *INFORMATION resources management , *REINFORCEMENT learning - Abstract
In recent years, extensive research has been conducted on the practical applications of autonomous driving. Much of this research relies on existing road infrastructure and aims to replace and automate human drivers. Concurrently, studies on zero-based control optimization focus on the effective use of road resources without assuming the presence of car lanes. These studies often overlook the physical constraints of vehicles in their control optimization based on reinforcement learning, leading to the learning of unrealistic control behaviors while simplifying the implementation of ranging sensors and vehicle-to-vehicle communication. Additionally, these studies do not use map information, which is widely employed in autonomous driving research. To address these issues, we constructed a simulation environment that incorporates physics simulations, realistically implements ranging sensors and vehicle-to-vehicle communication, and actively employs map information. Using this environment, we evaluated the effect of vehicle-to-vehicle communication and map information on vehicle control learning. Our experimental results show that vehicle-to-vehicle communication reduces collisions, while the use of map information improves the average vehicle speed and reduces the average lap time. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
3. Enhanced Naive Agent in Angry Birds AI Competition via Exploitation-Oriented Learning.
- Author
-
Miyazaki, Kazuteru
- Subjects
- *
ARTIFICIAL intelligence , *REINFORCEMENT learning , *INTELLIGENCE officers , *GAME & game-birds , *CONTESTS , *PROFIT-sharing - Abstract
The Angry Birds AI Competition engages artificial intelligence agents in a contest based on the game Angry Birds. This tournament has been conducted annually since 2012, with participants competing for high scores. The organizers of this competition provide a basic agent, termed "Naive Agent," as a baseline indicator. This study enhanced the Naive Agent by integrating a profit-sharing approach known as exploitation-oriented learning, which is a type of experience-enhanced learning. The effectiveness of this method was substantiated through numerical experiments. Additionally, this study explored the use of level selection learning within a multi-agent environment and validated the utility of the rationality theorem concerning the indirect rewards in this environment. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
4. Staged Reinforcement Learning for Complex Tasks Through Decomposed Environments
- Author
-
Pina, Rafael, Artaud, Corentin, Liu, Xiaolan, De Silva, Varuna, Filipe, Joaquim, Editorial Board Member, Ghosh, Ashish, Editorial Board Member, Prates, Raquel Oliveira, Editorial Board Member, Zhou, Lizhu, Editorial Board Member, Bennour, Akram, editor, Bouridane, Ahmed, editor, and Chaari, Lotfi, editor
- Published
- 2024
- Full Text
- View/download PDF
5. 多智能体博弈学习研究进展.
- Author
-
罗俊仁, 张万鹏, 苏炯铭, 袁唯淋, and 陈 璟
- Abstract
Copyright of Systems Engineering & Electronics is the property of Journal of Systems Engineering & Electronics Editorial Department and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
- Published
- 2024
- Full Text
- View/download PDF
6. Deep-silicon photon-counting x-ray projection denoising through reinforcement learning.
- Author
-
Tanveer, Md Sayed, Wiedeman, Christopher, Li, Mengzhou, Shi, Yongyi, De Man, Bruno, Maltz, Jonathan S., and Wang, Ge
- Subjects
- *
DEEP reinforcement learning , *REINFORCEMENT learning , *PHOTON counting , *ARTIFICIAL intelligence , *X-rays , *SPATIAL resolution - Abstract
BACKGROUND: In recent years, deep reinforcement learning (RL) has been applied to various medical tasks and produced encouraging results. OBJECTIVE: In this paper, we demonstrate the feasibility of deep RL for denoising simulated deep-silicon photon-counting CT (PCCT) data in both full and interior scan modes. PCCT offers higher spatial and spectral resolution than conventional CT, requiring advanced denoising methods to suppress noise increase. METHODS: In this work, we apply a dueling double deep Q network (DDDQN) to denoise PCCT data for maximum contrast-to-noise ratio (CNR) and a multi-agent approach to handle data non-stationarity. RESULTS: Using our method, we obtained significant image quality improvement for single-channel scans and consistent improvement for all three channels of multichannel scans. For the single-channel interior scans, the PSNR (dB) and SSIM increased from 33.4078 and 0.9165 to 37.4167 and 0.9790 respectively. For the multichannel interior scans, the channel-wise PSNR (dB) increased from 31.2348, 30.7114, and 30.4667 to 31.6182, 30.9783, and 30.8427 respectively. Similarly, the SSIM improved from 0.9415, 0.9445, and 0.9336 to 0.9504, 0.9493, and 0.0326 respectively. CONCLUSIONS: Our results show that the RL approach improves image quality effectively, efficiently, and consistently across multiple spectral channels and has great potential in clinical applications. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
7. Hierarchical Reward Model of Deep Reinforcement Learning for Enhancing Cooperative Behavior in Automated Driving.
- Author
-
Matsuda, Kenji, Suzuki, Tenta, Harada, Tomohiro, Matsuoka, Johei, Tobisawa, Mao, Hoshino, Jyunya, Itoh, Yuuki, Kumagae, Kaito, Kagawa, Toshinori, and Hattori, Kiyohiko
- Subjects
- *
DEEP reinforcement learning , *REINFORCEMENT learning , *REWARD (Psychology) , *GROUP work in education , *AUTOMOBILE driving , *SEARCHING behavior , *MOTOR vehicle driving - Abstract
In recent years, studies on practical application of automated driving have been conducted extensively. Most of the research assumes the existing road infrastructure and aims to replace human driving. There have also been studies that use reinforcement learning to optimize car control from a zero-based perspective in an environment without lanes, one of the existing types of road. In those studies, search and behavior acquisition using reinforcement learning has resulted in efficient driving control in an unknown environment. However, the throughput has not been high, while the crash rate has. To address this issue, this study proposes a hierarchical reward model that uses both individual and common rewards for reinforcement learning in order to achieve efficient driving control in a road, we assume environments of one-way, lane-less, automobile-only. Automated driving control is trained using a hierarchical reward model and evaluated through physical simulations. The results show that a reduction in crash rate and an improvement in throughput is attained by increasing the number of behaviors in which faster cars actively overtake slower ones. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
8. Deep Multi-Agent Reinforcement Learning for Decentralized Active Hypothesis Testing
- Author
-
Hadar Szostak and Kobi Cohen
- Subjects
Active hypothesis testing (AHT) ,controlled sensing for multihypothesis testing ,decentralized inference ,deep reinforcement learning (DRL) ,multi-agent learning ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
We consider a decentralized formulation of the active hypothesis testing (AHT) problem, where multiple agents gather noisy observations from the environment with the purpose of identifying the correct hypothesis. At each time step, agents have the option to select a sampling action. These different actions result in observations drawn from various distributions, each associated with a specific hypothesis. The agents collaborate to accomplish the task, where message exchanges between agents are allowed over a rate-limited communications channel. The objective is to devise a multi-agent policy that minimizes the Bayes risk. This risk comprises both the cost of sampling and the joint terminal cost incurred by the agents upon making a hypothesis declaration. Deriving optimal structured policies for AHT problems is generally mathematically intractable, even in the context of a single agent. As a result, recent efforts have turned to deep learning methodologies to address these problems, which have exhibited significant success in single-agent learning scenarios. In this paper, we tackle the multi-agent AHT formulation by introducing a novel algorithm rooted in the framework of deep multi-agent reinforcement learning. This algorithm, named Multi-Agent Reinforcement Learning for AHT (MARLA), operates at each time step by having each agent map its state to an action (sampling rule or stopping rule) using a trained deep neural network with the goal of minimizing the Bayes risk. We present a comprehensive set of experimental results that effectively showcase the agents’ ability to learn collaborative strategies and enhance performance using MARLA. Furthermore, we demonstrate the superiority of MARLA over single-agent learning approaches. Finally, we provide an open-source implementation of the MARLA framework, for the benefit of researchers and developers in related domains.
- Published
- 2024
- Full Text
- View/download PDF
9. Scaling Up Multi-Agent Reinforcement Learning: An Extensive Survey on Scalability Issues
- Author
-
Dingbang Liu, Fenghui Ren, Jun Yan, Guoxin Su, Wen Gu, and Shohei Kato
- Subjects
Multi-agent learning ,reinforcement learning ,scalability ,collective learning ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
Multi-agent learning has made significant strides in recent years. Benefiting from deep learning, multi-agent deep reinforcement learning (MADRL) has transcended traditional limitations seen in tabular tasks, arousing tremendous research interest. However, compared to other challenges in MADRL, scalability remains underemphasized, impeding the application of MADRL in complex scenarios. Scalability stands as a foundational attribute of the multi-agent system (MAS), offering a potent approach to understand and improve collective learning among agents. It encompasses the capacity to handle the increasing state-action space which arises not only from a large number of agents but also from other factors related to agents and environment. In contrast to prior surveys, this work provides a comprehensive exposition of scalability concerns in MADRL. We first introduce foundational knowledge about deep reinforcement learning and MADRL to underscore the distinctiveness of scalability issues in this domain. Subsequently, we delve into the problems posed by scalability, examining agent complexity, environment complexity, and robustness against perturbation. We elaborate on the methods that demonstrate the evolution of scalable algorithms. To conclude this survey, we discuss challenges, identify trends, and outline possible directions for future work on scalability issues. It is our aspiration that this survey enhances the understanding of researchers in this field, providing a valuable resource for in-depth exploration.
- Published
- 2024
- Full Text
- View/download PDF
10. A New Distributed Architecture Based on Reinforcement Learning for Parameter Estimation in Image Processing
- Author
-
Qaffou, Issam, Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, Ezziyyani, Mostafa, editor, and Balas, Valentina Emilia, editor
- Published
- 2023
- Full Text
- View/download PDF
11. On Implementing a Simulation Environment for a Cooperative Multi-agent Learning Approach to Mitigate DRDoS Attacks
- Author
-
Kawazoe, Tomoki, Fukuta, Naoki, Kacprzyk, Janusz, Series Editor, Hadfi, Rafik, editor, Aydoğan, Reyhan, editor, Ito, Takayuki, editor, and Arisaka, Ryuta, editor
- Published
- 2023
- Full Text
- View/download PDF
12. Learning in the Presence of Multiple Agents
- Author
-
Ramponi, Giorgia and Riva, Carlo G., editor
- Published
- 2023
- Full Text
- View/download PDF
13. Cooperative Multi-Agent Nash Q-Learning (CMNQL) for Decision Building in Retail Shop
- Author
-
Vidhate, Deepak A., Kulkarni, Parag, Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, Kaiser, M. Shamim, editor, Xie, Juanying, editor, and Rathore, Vijay Singh, editor
- Published
- 2023
- Full Text
- View/download PDF
14. Emergent cooperation from mutual acknowledgment exchange in multi-agent reinforcement learning
- Author
-
Phan, Thomy, Sommer, Felix, Ritz, Fabian, Altmann, Philipp, Nüßlein, Jonas, Kölle, Michael, Belzner, Lenz, and Linnhoff-Popien, Claudia
- Published
- 2024
- Full Text
- View/download PDF
15. A deep reinforcement learning strategy for autonomous robot flocking.
- Author
-
Martínez, Fredy, Montiel, Holman, and Wanumen, Luis
- Subjects
DEEP reinforcement learning ,AUTONOMOUS robots ,REINFORCEMENT learning ,ANIMAL social behavior ,MULTIAGENT systems ,INTELLIGENCE levels - Abstract
Social behaviors in animals such as bees, ants, and birds have shown high levels of intelligence from a multi-agent system perspective. They present viable solutions to real-world problems, particularly in navigating constrained environments with simple robotic platforms. Among these behaviors is swarm flocking, which has been extensively studied for this purpose. Flocking algorithms have been developed from basic behavioral rules, which often require parameter tuning for specific applications. However, the lack of a general formulation for tuning has made these strategies difficult to implement in various real conditions, and even to replicate laboratory behaviors. In this paper, we propose a flocking scheme for small autonomous robots that can self-learn in dynamic environments, derived from a deep reinforcement learning process. Our approach achieves flocking independently of population size and environmental characteristics, with minimal external intervention. Our multi-agent system model considers each agent’s action as a linear function dynamically adjusting the motion according to interactions with other agents and the environment. Our strategy is an important contribution toward real-world flocking implementation. We demonstrate that our approach allows for autonomous flocking in the system without requiring specific parameter tuning, making it ideal for applications where there [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
16. Fighting for Routes: Resource Allocation among Competing Planners in Transportation Networks.
- Author
-
Roman, Charlotte and Turrini, Paolo
- Subjects
- *
RESOURCE allocation , *PLANNERS , *COST functions , *SOCIAL services , *PRICES , *REINFORCEMENT learning , *ROUTE choice - Abstract
In transportation networks, incomplete information is ubiquitous, and users often delegate their route choice to distributed route planners. To model and study these systems, we introduce network control games, consisting of multiple actors seeking to optimise the social welfare of their assigned subpopulations through resource allocation in an underlying nonatomic congestion game. We first analyse the inefficiency of the routing equilibria by calculating the Price of Anarchy for polynomial cost functions, and then, using an Asynchronous Advantage Actor–Critic algorithm implementation, we show that reinforcement learning agents are vulnerable to choosing suboptimal routing as predicted by the theory. Finally, we extend the analysis to allow vehicles to choose their route planner and study the associated equilibria. Our results can be applied to mitigate inefficiency issues arising in large transport networks with route controlled autonomous vehicles. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
17. Causal inference multi-agent reinforcement learning for traffic signal control.
- Author
-
Yang, Shantian, Yang, Bo, Zeng, Zheng, and Kang, Zhongfeng
- Subjects
- *
TRAFFIC signs & signals , *REINFORCEMENT learning , *TRAFFIC engineering , *CAUSAL inference - Abstract
• A Causal-Inference (CI) model is designed for the non-stationary multi-agent environment. • Combining with Multi-Agent learning, a CI-MA algorithm is proposed for traffic signal control. • Different granularity of traffic information is fused for feature representation. • A representation loss function and MA loss function are designed for joint optimization. • Experiments show that CI-MA algorithm outperforms the state-of-the-art algorithms. A primary challenge in multi-agent reinforcement learning for traffic signal control is to produce effective cooperative traffic-signal policies in non-stationary multi-agent traffic environments. However, each agent suffers from its local non-stationary traffic environment caused by the time-varying traffic-signal policies of adjacent agents; At the same time, different agents also produce time-varying traffic-signal policies, which further results in the non-stationarity of the whole traffic environment, so these produced traffic-signal policies may be ineffective. In this work, we propose a Causal Inference Multi-Agent reinforcement learning (CI-MA) algorithm, which can alleviate the non-stationarity of multi-agent traffic environments from both feature representation and optimization, eventually helps to produce effective cooperative traffic-signal policies. Specifically, a Causal-Inference (CI) model is first designed to reason about and tackle the non-stationarity of multi-agent traffic environments by both acquiring feature representation distributions and deriving variational lower bounds (i.e., objective functions); And then, based on the designed CI model, we propose a CI-MA algorithm, in which the feature representations are acquired from the non-stationarity of multi-agent traffic environments at both task level and timestep level, the acquired feature representations are used to produce cooperative traffic-signal policies and Q-values for multiple agents; Finally the corresponding objective functions optimize the whole algorithm from both causal inference and multi-agent reinforcement learning. Experiments are conducted in different non-stationary multi-agent traffic environments. Results show that CI-MA algorithm outperforms other state-of-the-art algorithms, and demonstrate that the proposed algorithm trained in synthetic-traffic environments can be effectively transferred to both synthetic- and real-traffic environments with non-stationarity. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
18. Multi-agent adaptive routing by multi-head-attention-based twin agents using reinforcement learning
- Author
-
Timofey A. Gribanov, Andrey A. Filchenkov, Artur A. Azarov, and Anatoly A. Shalyto
- Subjects
routing ,multi-agent learning ,reinforcement learning ,adaptive routing ,Optics. Light ,QC350-467 ,Electronic computers. Computer science ,QA75.5-76.95 - Abstract
A regular condition, typical for packet routing, for the problem of cargo transportation, and for the problem of flow control, is the variability of the graph. Reinforcement learning based adaptive routing algorithms are designed to solve the routing problem with this condition. However, with significant changes in the graph, the existing routing algorithms require complete retraining. To handle this challenge, we propose a novel method based on multi-agent modeling with twin-agents for which new neural network architecture with multi-headed internal attention is proposed, pre-trained within the framework of the multi-view learning paradigm. An agent in such a paradigm uses a vertex as an input, twins of the main agent are placed at the vertices of the graph and select a neighbor to which the object should be transferred. We carried out a comparative analysis with the existing DQN-LE-routing multi-agent routing algorithm on two stages: pre-training and simulation. In both cases, launches were considered by changing the topology during testing or simulation. Experiments have shown that the proposed adaptability enhancement method provides global adaptability by increasing delivery time only by 14.5 % after global changes occur. The proposed method can be used to solve routing problems with complex path evaluation functions and dynamically changing graph topologies, for example, in transport logistics and for managing conveyor belts in production.
- Published
- 2022
- Full Text
- View/download PDF
19. Joint learning of agents and graph embeddings in a conveyor belt control problem
- Author
-
Konstantin E. Rybkin, Andrey A. Filchenkov, Artur A. Azarov, Alexey S. Zabashta, and Anatoly A. Shalyto
- Subjects
multi-agent learning ,reinforcement learning ,adaptive routing ,conveyor belt ,graph representation ,Optics. Light ,QC350-467 ,Electronic computers. Computer science ,QA75.5-76.95 - Abstract
We focus on the problem of routing a conveyor belts system based on a multi-agent approach. Most of these airport baggage belt conveyor systems use routing algorithms based on manual simulation of conveyor behavior. This approach does not scale well, and new research in machine learning proposes to solve the routing problem using reinforcement learning. To solve this problem, we propose an approach to joint learning of agents and vector representations of a graph. Within this approach, we develop a QSDNE algorithm, which uses DQN agents and SDNE embeddings. A comparative analysis was carried out with multi-agent routing algorithms without joint learning. The results of the QSDNE algorithm showed its effectiveness in optimizing the delivery time and energy consumption in conveyor systems as it helped to reduce mean delivery time by 6 %. The proposed approach can be used to solve routing problems with complex path estimation functions and dynamically changing graph topologies, and the proposed algorithm can be used to control conveyor belts at airports and in manufacturing workshops.
- Published
- 2022
- Full Text
- View/download PDF
20. Signal Instructed Coordination in Cooperative Multi-agent Reinforcement Learning
- Author
-
Chen, Liheng, Guo, Hongyi, Du, Yali, Fang, Fei, Zhang, Haifeng, Zhang, Weinan, Yu, Yong, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Chen, Jie, editor, Lang, Jérôme, editor, Amato, Christopher, editor, and Zhao, Dengji, editor
- Published
- 2022
- Full Text
- View/download PDF
21. Deep Reinforcement Ant Colony Optimization for Swarm Learning
- Author
-
Bolshakov, Vladislav, Alfimtsev, Alexander, Sakulin, Sergey, Bykov, Nikita, Kacprzyk, Janusz, Series Editor, Kryzhanovsky, Boris, editor, Dunin-Barkowski, Witali, editor, Redko, Vladimir, editor, Tiumentsev, Yury, editor, and Klimov, Valentin V., editor
- Published
- 2022
- Full Text
- View/download PDF
22. Multi-Agent Deep Reinforcement Learning for Multi-Robot Applications: A Survey.
- Author
-
Orr, James and Dutta, Ayan
- Subjects
- *
DEEP reinforcement learning , *MEDICAL care , *ROBOTICS - Abstract
Deep reinforcement learning has produced many success stories in recent years. Some example fields in which these successes have taken place include mathematics, games, health care, and robotics. In this paper, we are especially interested in multi-agent deep reinforcement learning, where multiple agents present in the environment not only learn from their own experiences but also from each other and its applications in multi-robot systems. In many real-world scenarios, one robot might not be enough to complete the given task on its own, and, therefore, we might need to deploy multiple robots who work together towards a common global objective of finishing the task. Although multi-agent deep reinforcement learning and its applications in multi-robot systems are of tremendous significance from theoretical and applied standpoints, the latest survey in this domain dates to 2004 albeit for traditional learning applications as deep reinforcement learning was not invented. We classify the reviewed papers in our survey primarily based on their multi-robot applications. Our survey also discusses a few challenges that the current research in this domain faces and provides a potential list of future applications involving multi-robot systems that can benefit from advances in multi-agent deep reinforcement learning. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
23. Learning to play against any mixture of opponents
- Author
-
Max Olan Smith, Thomas Anthony, and Michael P. Wellman
- Subjects
reinforcement learning (RL) ,transfer learning (TL) ,deep reinforcement learning (deep RL) ,value-based reinforcement learning ,multi-agent learning ,Electronic computers. Computer science ,QA75.5-76.95 - Abstract
Intuitively, experience playing against one mixture of opponents in a given domain should be relevant for a different mixture in the same domain. If the mixture changes, ideally we would not have to train from scratch, but rather could transfer what we have learned to construct a policy to play against the new mixture. We propose a transfer learning method, Q-Mixing, that starts by learning Q-values against each pure-strategy opponent. Then a Q-value for any distribution of opponent strategies is approximated by appropriately averaging the separately learned Q-values. From these components, we construct policies against all opponent mixtures without any further training. We empirically validate Q-Mixing in two environments: a simple grid-world soccer environment, and a social dilemma game. Our experiments find that Q-Mixing can successfully transfer knowledge across any mixture of opponents. Next, we consider the use of observations during play to update the believed distribution of opponents. We introduce an opponent policy classifier—trained reusing Q-learning data—and use the classifier results to refine the mixing of Q-values. Q-Mixing augmented with the opponent policy classifier performs better, with higher variance, than training directly against a mixed-strategy opponent.
- Published
- 2023
- Full Text
- View/download PDF
24. Behavior analysis of emergent rule discovery for cooperative automated driving using deep reinforcement learning.
- Author
-
Harada, Tomohiro, Matsuoka, Johei, and Hattori, Kiyohiko
- Abstract
With the improvements in AI technology and sensor performance, research on automated driving has become increasingly popular. However, most studies are based on human driving styles. In this study, we consider an environment in which only autonomous vehicles are present. In such an environment, it is essential to develop an appropriate control method that actively utilizes the characteristics of autonomous vehicles, such as dense information exchange and highly accurate vehicle control. To address this issue, we investigated the emergence of automatic driving rules using reinforcement learning based on information from surrounding vehicles using inter-vehicle communication. We evaluated whether reinforcement learning converges in a situation where distance sensor information can be shared in real-time using vehicle-to-vehicle communication and whether reinforcement learning can learn a rational driving method. The simulation results show a positive trend in the cumulative rewards value, and it indicates that the proposed multi-agent learning method with an extended own-vehicle environment has the potential to learn automated vehicle control with cooperative behavior automatically. Furthermore, we analyzed whether a rational driving method (action selection) can be learned by reinforcement learning. The simulation results showed that reinforcement learning achieves rational control of the overtaking behavior. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
25. MALib: A Parallel Framework for Population-based Multi-agent Reinforcement Learning.
- Author
-
Ming Zhou, Ziyu Wan, Hanjing Wang, Muning Wen, Runzhe Wu, Ying Wen, Yaodong Yang, Yong Yu, Jun Wang, and Weinan Zhang
- Subjects
- *
MACHINE learning , *REINFORCEMENT learning , *MARL - Abstract
Population-based multi-agent reinforcement learning (PB-MARL) encompasses a range of methods that merge dynamic population selection with multi-agent reinforcement learning algorithms (MARL). While PB-MARL has demonstrated notable achievements in complex multi-agent tasks, its sequential execution is plagued by low computational efficiency due to the diversity in computing patterns and policy combinations. We propose a solution involving a stateless central task dispatcher and stateful workers to handle PB-MARL's subroutines, thereby capitalizing on parallelism across various components for efficient problem-solving. In line with this approach, we introduce MALib, a parallel framework that incorporates a task control model, independent data servers, and an abstraction of MARL training paradigms. The framework has undergone extensive testing and is available under the MIT license (https://github.com/sjtu-marl/malib). [ABSTRACT FROM AUTHOR]
- Published
- 2023
26. Many-to-Many Data Aggregation Scheduling Based on Multi-Agent Learning for Multi-Channel WSN.
- Author
-
Lu, Yao, Wang, Keweiqi, and He, Erbao
- Subjects
DISTRIBUTED algorithms ,MULTICHANNEL communication ,WIRELESS sensor networks ,WIRELESS channels ,SCHEDULING ,MULTIAGENT systems ,WIRELESS communications - Abstract
Many-to-many data aggregation has become an indispensable technique to realize the simultaneous executions of multiple applications with less data traffic load and less energy consumption in a multi-channel WSN (wireless sensor network). The problem of how to efficiently allocate time slot and channel for each node is one of the most critical problems for many-to-many data aggregation in multi-channel WSNs, and this problem can be solved with the new distributed scheduling method without communication conflict outlined in this paper. The many-to-many data aggregation scheduling process is abstracted as a decentralized partially observable Markov decision model in a multi-agent system. In the case of embedding cooperative multi-agent learning technology, sensor nodes with group observability work in a distributed manner. These nodes cooperated and exploit local feedback information to automatically learn the optimal scheduling strategy, then select the best time slot and channel for wireless communication. Simulation results show that the new scheduling method has advantages in performance when comparing with the existing methods. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
27. Joint Trajectory and Handover Management for UAVs Co-existing with Terrestrial Users : Deep Reinforcement Learning Based Approaches
- Author
-
Deng, Yuhang and Deng, Yuhang
- Abstract
Integrating unmanned aerial vehicles (UAVs) as aerial user equipments (UEs) into cellular networks is now considered as a promising solution to provide extensive wireless connectivity for supporting UAV-centric commercial or civilian applications. However, the co-existence of UAVs with conventional terrestrial UEs is one of the primary challenges for this solution. Flying at higher altitudes with maneuverability advantage, UAVs are able to establish line-of-sight (LoS) connectivity with more base stations (BSs) than terrestrial UEs. Although LoS connectivity reduces the communication delay of UAVs, they also simultaneously increase the interference that UAVs cause to terrestrial UEs. In scenarios involving multiple UAVs, LoS connectivity can even lead to interference issues among themselves. In addition, LoS connectivity leads to extensive overlapping coverage areas of multiple BSs for UAVs, forcing them to perform frequent handovers during the flight if the received signal strength (RSS)-based handover policy is employed. The trajectories and BS associations of UAVs, along with their radio resource allocation are essential design parameters aimed at enabling their seamless integration into cellular networks, with a particular focus on managing interference levels they generate and reducing the redundant handovers they performe. Hence, this thesis designs two joint trajectory and handover management approaches for single-UAV and multi-UAVs scenarios, respectively, aiming to minimize the weighted sum of three key performance indicators (KPIs): transmission delay, up-link interference, and handover numbers. The approaches are based on deep reinforcement learning (DRL) frameworks with dueling double deep Q-network (D3QN) and Q-learning with a MIXer network (QMIX) algorithms being selected as the training agents, respectively. The choice of these DRL algorithms is motivated by their capability in designing sequential decision-making policies consisting of trajectory des, Att integrera obemannade flygfordon (UAV) som flyganvändarutrustning (UE) i cellulära nätverk anses nu vara en lovande lösning för att tillhandahålla omfattande trådlös anslutning för att stödja UAV-centrerade kommersiella eller civila tillämpningar. Men samexistensen av UAV med konventionella markbundna UE är en av de främsta utmaningarna för denna lösning. Flygande på högre höjder med manövrerbarhetsfördelar kan UAV:er etablera siktlinje (LoS)-anslutning med fler basstationer (BS) än markbundna UE. Även om LoS-anslutning minskar kommunikationsfördröjningen för UAV:er, ökar de samtidigt störningen som UAV:er orsakar för markbundna UE. I scenarier som involverar flera UAV:er kan LoS-anslutning till och med leda till störningsproblem sinsemellan. Dessutom leder LoS-anslutning till omfattande överlappande täckningsområden för flera BS:er för UAV, vilket tvingar dem att utföra frekventa överlämningar under flygningen om den mottagna signalstyrkan (RSS)-baserad överlämningspolicy används. UAV:s banor och BS-associationer, tillsammans med deras radioresursallokering, är väsentliga designparametrar som syftar till att möjliggöra deras sömlösa integrering i cellulära nätverk, med särskilt fokus på att hantera störningsnivåer de genererar och minska de redundanta handovers de utför. Därför designar denna avhandling två gemensamma bana och handover-hanteringsmetoder för en-UAV-respektive multi-UAV-scenarier, som syftar till att minimera den viktade summan av tre nyckelprestandaindikatorer (KPI:er): överföringsfördröjning, upplänksinterferens och överlämningsnummer . Tillvägagångssätten är baserade på ramverk för djup förstärkning inlärning (DRL) med duellerande dubbla djupa Q-nätverk (D3QN) och Q-lärande med ett MIXer-nätverk (QMIX) algoritmer som väljs som träningsagenter. Valet av dessa DRL-algoritmer motiveras av deras förmåga att utforma sekventiella beslutsfattande policyer som består av banadesign och handover-hantering. Resultaten visar att de föreslagna tillvägagångs
- Published
- 2024
28. Multi-agent Service Area Adaptation for Ride-Sharing Using Deep Reinforcement Learning
- Author
-
Yoshida, Naoki, Noda, Itsuki, Sugawara, Toshiharu, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Demazeau, Yves, editor, Holvoet, Tom, editor, Corchado, Juan M., editor, and Costantini, Stefania, editor
- Published
- 2020
- Full Text
- View/download PDF
29. Entropy regularized actor-critic based multi-agent deep reinforcement learning for stochastic games.
- Author
-
Hao, Dong, Zhang, Dongcheng, Shi, Qi, and Li, Kai
- Subjects
- *
REINFORCEMENT learning , *MAXIMUM entropy method , *EDUCATIONAL games , *ENTROPY , *LAGRANGE equations , *MARL - Abstract
Multi-agent reinforcement learning (MARL) is an abstract framework modeling a dynamic environment that involves multiple learning and decision-making agents, each of which tries to maximize her cumulative reward. In MARL, each agent discovers a strategy alongside others and adapts her policy in response to the behavioural changes of others. A fundamental difficulty faced by MARL is that every agent is dynamically learning and changing to improve her reward, making the whole system unstable and agents' policies difficult to converge. In this paper, we introduce the entropy regularizer into the Bellman equation and utilize Lagrange approach to optimize the entropy regularizer. We then propose a MARL algorithm based on the maximum entropy principle and the actor-critic method. This algorithm follows the policy gradient approach and uses a policy network and a value network. We call it Multi-Agent Deep Soft Policy Gradient (MADSPG). Then by using the Lagrange approach and dynamic minimax optimization, we propose the AUTO-MADSPG algorithm with an automatically adjusted entropy regularizer. These algorithms make multi-agent learning more stable while sufficient exploration is guaranteed. Finally, we also incorporate MADSPG with the recently proposed opponent modeling component into an integrated framework. This framework outperforms many state-of-the-art MARL algorithms in conventional cooperative and competitive game settings. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
30. Techniques and Paradigms in Modern Game AI Systems.
- Author
-
Lu, Yunlong and Li, Wenxin
- Subjects
- *
REINFORCEMENT learning , *VIDEO game culture - Abstract
Games have long been benchmarks and test-beds for AI algorithms. With the development of AI techniques and the boost of computational power, modern game AI systems have achieved superhuman performance in many games played by humans. These games have various features and present different challenges to AI research, so the algorithms used in each of these AI systems vary. This survey aims to give a systematic review of the techniques and paradigms used in modern game AI systems. By decomposing each of the recent milestones into basic components and comparing them based on the features of games, we summarize the common paradigms to build game AI systems and their scope and limitations. We claim that deep reinforcement learning is the most general methodology to become a mainstream method for games with higher complexity. We hope this survey can both provide a review of game AI algorithms and bring inspiration to the game AI community for future directions. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
31. META: A City-Wide Taxi Repositioning Framework Based on Multi-Agent Reinforcement Learning.
- Author
-
Liu, Chenxi, Chen, Chao-Xiong, and Chen, Chao
- Abstract
The popularity of online ride-hailing platforms has made people travel smarter than ever before. But people still frequently encounter the dilemma of “taxi drivers hunt passengers and passengers search for unoccupied taxis”. Many studies try to reposition idle taxis to alleviate such issues by using reinforcement learning based methods, as they are capable of capturing future demand/supply dynamics. However, they either coordinate all city-wide taxis in a centralized manner or treat all taxis in a region homogeneously, resulting in inefficient or inaccurate learning performance. In this paper, we propose a multi-agent reinforcement learning based framework named META (MakE Taxi Act differently in each agent) to mitigate the disequilibrium of supply and demand via repositioning taxis at the city scale. We decompose it into two subproblems, i.e., taxi demand/supply determination and taxi dispatching strategy formulation. Two components are wisely built in META to address the gap collaboratively, in which each region is regarded as an agent and taxis inside the region can make two different actions. Extensive experiments demonstrate that META outperforms existing methods. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
32. A multi‐agent based mechanism for collaboratively detecting distributed denial of service attacks in internet of vehicles.
- Author
-
Dong, Tingting, Chen, Lei, Zhou, Li, Xue, Fei, and Qin, Huilin
- Subjects
DENIAL of service attacks ,HIDDEN Markov models ,VITERBI decoding ,REINFORCEMENT learning ,BOTNETS ,INTRUSION detection systems (Computer security) - Abstract
Distributed denial of service (DDoS) attacks have become a hidden danger in the development of the internet of vehicles (IoV). DDoS attacks for TCP protocol are studied to improve the information security environment of IoV. For the distribution characteristics of DDoS attacks, an information sharing and collaborative detection mechanism based on multi‐agent is proposed. Considering the relationship between the features of adjacent moments in the TCP communication, the DDoS detection model based on hidden Markov model is built, and the Viterbi algorithm is improved for the problem of the false alarm in the observation sequence. The optimal communication strategy among agents is determined by deep reinforcement learning, and fusion algorithm is designed to improve the current strategy of agents. Three groups of comparative experiments are designed and analyzed. The simulation results show that proposed algorithms are effective. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
33. Fighting for Routes: Resource Allocation among Competing Planners in Transportation Networks
- Author
-
Charlotte Roman and Paolo Turrini
- Subjects
resource allocation ,congestion games ,multi-agent learning ,efficiency ,Technology ,Social Sciences - Abstract
In transportation networks, incomplete information is ubiquitous, and users often delegate their route choice to distributed route planners. To model and study these systems, we introduce network control games, consisting of multiple actors seeking to optimise the social welfare of their assigned subpopulations through resource allocation in an underlying nonatomic congestion game. We first analyse the inefficiency of the routing equilibria by calculating the Price of Anarchy for polynomial cost functions, and then, using an Asynchronous Advantage Actor–Critic algorithm implementation, we show that reinforcement learning agents are vulnerable to choosing suboptimal routing as predicted by the theory. Finally, we extend the analysis to allow vehicles to choose their route planner and study the associated equilibria. Our results can be applied to mitigate inefficiency issues arising in large transport networks with route controlled autonomous vehicles.
- Published
- 2023
- Full Text
- View/download PDF
34. Multi-Agent Deep Reinforcement Learning for Multi-Robot Applications: A Survey
- Author
-
James Orr and Ayan Dutta
- Subjects
deep reinforcement learning ,multi-robot systems ,multi-agent learning ,survey ,Chemical technology ,TP1-1185 - Abstract
Deep reinforcement learning has produced many success stories in recent years. Some example fields in which these successes have taken place include mathematics, games, health care, and robotics. In this paper, we are especially interested in multi-agent deep reinforcement learning, where multiple agents present in the environment not only learn from their own experiences but also from each other and its applications in multi-robot systems. In many real-world scenarios, one robot might not be enough to complete the given task on its own, and, therefore, we might need to deploy multiple robots who work together towards a common global objective of finishing the task. Although multi-agent deep reinforcement learning and its applications in multi-robot systems are of tremendous significance from theoretical and applied standpoints, the latest survey in this domain dates to 2004 albeit for traditional learning applications as deep reinforcement learning was not invented. We classify the reviewed papers in our survey primarily based on their multi-robot applications. Our survey also discusses a few challenges that the current research in this domain faces and provides a potential list of future applications involving multi-robot systems that can benefit from advances in multi-agent deep reinforcement learning.
- Published
- 2023
- Full Text
- View/download PDF
35. CONTINUOUS-TIME CONVERGENCE RATES IN POTENTIAL AND MONOTONE GAMES.
- Author
-
BOLIN GAO and PAVEL, LACRA
- Subjects
- *
NASH equilibrium , *POTENTIAL functions , *TELEVISION game programs , *GAMES , *EQUILIBRIUM - Abstract
In this paper, we provide exponential rates of convergence to the interior Nash equilibrium for continuous-time dual-space game dynamics such as mirror descent (MD) and actorcritic (AC). We perform our analysis in N-player continuous concave games that satisfy certain monotonicity assumptions while possibly also admitting potential functions. In the first part of this paper, we provide a novel relative characterization of monotone games and show that MD and its discounted version converge with\scrO (e\beta t) in relatively strongly and relatively hypomonotone games, respectively. In the second part of this paper, we specialize our results to games that admit a relatively strongly concave potential and show that AC converges with\scrO (e\beta t). These rates extend their known convergence conditions. Simulations are performed which empirically back up our results. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
36. A Framework for Dynamic Decision Making by Multi-agent Cooperative Fault Pair Algorithm (MCFPA) in Retail Shop Application
- Author
-
Vidhate, Deepak A., Kulkarni, Parag, Howlett, Robert James, Series Editor, Jain, Lakhmi C., Series Editor, Satapathy, Suresh Chandra, editor, and Joshi, Amit, editor
- Published
- 2019
- Full Text
- View/download PDF
37. Impact of Neighboring Agent’s Characteristics with Q-Learning in Network Multi-agent System
- Author
-
Kaur, Harjot, Devi, Ginni, Barbosa, Simone Diniz Junqueira, Series Editor, Filipe, Joaquim, Series Editor, Kotenko, Igor, Series Editor, Sivalingam, Krishna M., Series Editor, Washio, Takashi, Series Editor, Yuan, Junsong, Series Editor, Zhou, Lizhu, Series Editor, Ghosh, Ashish, Series Editor, Luhach, Ashish Kumar, editor, Singh, Dharm, editor, Hsiung, Pao-Ann, editor, Hawari, Kamarul Bin Ghazali, editor, Lingras, Pawan, editor, and Singh, Pradeep Kumar, editor
- Published
- 2019
- Full Text
- View/download PDF
38. Multi-agent Hierarchical Reinforcement Learning with Dynamic Termination
- Author
-
Han, Dongge, Böhmer, Wendelin, Wooldridge, Michael, Rogers, Alex, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Nayak, Abhaya C., editor, and Sharma, Alok, editor
- Published
- 2019
- Full Text
- View/download PDF
39. User-Centric Radio Access Technology Selection: A Survey of Game Theory Models and Multi-Agent Learning Algorithms
- Author
-
Giuseppe Caso, Ozgu Alay, Guido Carlo Ferrante, Luca De Nardis, Maria-Gabriella Di Benedetto, and Anna Brunstrom
- Subjects
Radio access technology selection ,game theory ,multi-agent learning ,reinforcement learning ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
User-centric radio access technology (RAT) selection is a key communication paradigm, given the increased number of available RATs and increased cognitive capabilities at the user end. When considered against traditional network-centric approaches, user-centric RAT selection results in reduced network-side management load, and leads to lower operational costs for RATs, as well as improved quality of service (QoS) and quality of experience (QoE) for users. The complex between-users interactions involved in RAT selection require, however, specific analyses, toward developing reliable and efficient schemes. Two theoretical frameworks are most often applied to user-centric RAT selection analysis, i.e., game theory (GT) and multi-agent learning (MAL). As a consequence, several GT models and MAL algorithms have been recently proposed to solve the problem at hand. A comprehensive discussion of such models and algorithms is, however, currently missing. Moreover, novel issues introduced by next-generation communication systems also need to be addressed. This paper proposes to fill the above gaps by providing a unified reference for both ongoing research and future research directions in the field. In particular, the review addresses the most common GT and MAL models and algorithms, and scenario settings adopted in user-centric RAT selection in terms of utility function and network topology. Regarding GT, the review focuses on non-cooperative models, because of their widespread use in RAT selection; as for MAL, a large number of algorithms are described, ranging from game-theoretic to reinforcement learning (RL) schemes, and also including most recent approaches, such as deep RL (DRL) and multi-armed bandit (MAB). Models and algorithms are analyzed by comparatively reviewing relevant literature. Finally, open challenges are discussed, in light of ongoing research and standardization activities.
- Published
- 2021
- Full Text
- View/download PDF
40. Reward-based epigenetic learning algorithm for a decentralised multi-agent system
- Author
-
Mukhlish, Faqihza, Page, John, and Bain, Michael
- Published
- 2020
- Full Text
- View/download PDF
41. Dynamical systems as a level of cognitive analysis of multi-agent learning: Algorithmic foundations of temporal-difference learning dynamics.
- Author
-
Barfuss, Wolfram
- Subjects
- *
DYNAMICAL systems , *COGNITIVE analysis , *REINFORCEMENT learning , *GAME theory , *MULTIAGENT systems - Abstract
A dynamical systems perspective on multi-agent learning, based on the link between evolutionary game theory and reinforcement learning, provides an improved, qualitative understanding of the emerging collective learning dynamics. However, confusion exists with respect to how this dynamical systems account of multi-agent learning should be interpreted. In this article, I propose to embed the dynamical systems description of multi-agent learning into different abstraction levels of cognitive analysis. The purpose of this work is to make the connections between these levels explicit in order to gain improved insight into multi-agent learning. I demonstrate the usefulness of this framework with the general and widespread class of temporal-difference reinforcement learning. I find that its deterministic dynamical systems description follows a minimum free-energy principle and unifies a boundedly rational account of game theory with decision-making under uncertainty. I then propose an on-line sample-batch temporal-difference algorithm which is characterized by the combination of applying a memory-batch and separated state-action value estimation. I find that this algorithm serves as a micro-foundation of the deterministic learning equations by showing that its learning trajectories approach the ones of the deterministic learning equations under large batch sizes. Ultimately, this framework of embedding a dynamical systems description into different abstraction levels gives guidance on how to unleash the full potential of the dynamical systems approach to multi-agent learning. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
42. Multi-agent deep reinforcement learning: a survey.
- Author
-
Gronauer, Sven and Diepold, Klaus
- Subjects
DEEP learning ,REINFORCEMENT learning ,MACHINE learning - Abstract
The advances in reinforcement learning have recorded sublime success in various domains. Although the multi-agent domain has been overshadowed by its single-agent counterpart during this progress, multi-agent reinforcement learning gains rapid traction, and the latest accomplishments address problems with real-world complexity. This article provides an overview of the current developments in the field of multi-agent deep reinforcement learning. We focus primarily on literature from recent years that combines deep reinforcement learning methods with a multi-agent scenario. To survey the works that constitute the contemporary landscape, the main contents are divided into three parts. First, we analyze the structure of training schemes that are applied to train multiple agents. Second, we consider the emergent patterns of agent behavior in cooperative, competitive and mixed scenarios. Third, we systematically enumerate challenges that exclusively arise in the multi-agent domain and review methods that are leveraged to cope with these challenges. To conclude this survey, we discuss advances, identify trends, and outline possible directions for future work in this research area. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
43. Spatial Positioning Token (SPToken) for Smart Mobility.
- Author
-
Overko, Roman, Ordonez-Hurtado, Rodrigo, Zhuk, Sergiy, Ferraro, Pietro, Cullen, Andrew, and Shorten, Robert
- Abstract
We introduce a permissioned distributed ledger technology (DLT) design for crowdsourced smart mobility applications. This architecture is based on a directed acyclic graph architecture (similar to the IOTA tangle) and uses both Proof-of-Work and Proof-of-Position mechanisms to provide protection against spam attacks and malevolent actors. In addition to enabling individuals to retain ownership of their data and to monetize it, the architecture is also suitable for distributed privacy-preserving machine learning algorithms, is lightweight, and can be implemented in simple internet-of-things (IoT) devices. To demonstrate its efficacy, we apply this framework to reinforcement learning settings where a third party is interested in acquiring information from agents. In particular, one may be interested in sampling an unknown vehicular traffic flow in a city, using a DLT-type architecture and without perturbing the density, with the idea of realizing a set of virtual tokens as surrogates of real vehicles to explore geographical areas of interest. These tokens, whose authenticated position determines write access to the ledger, are thus used to emulate the probing actions of commanded (real) vehicles on a given planned route by “jumping” from a passing-by vehicle to another to complete the planned trajectory. Consequently, the environment stays unaffected (i.e., the autonomy of participating vehicles is not influenced by the algorithm), regardless of the number of emitted tokens. The design of such a DLT architecture is presented, and numerical results from large-scale simulations are provided to validate the proposed approach. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
44. On the Approximation of Cooperative Heterogeneous Multi-Agent Reinforcement Learning (MARL) using Mean Field Control (MFC).
- Author
-
Mondal, Washim Uddin, Agarwal, Mridul, Aggarwal, Vaneet, and Ukkusuri, Satish V.
- Subjects
- *
REINFORCEMENT learning , *MARL , *MARGINAL distributions , *SAMPLING errors - Abstract
Mean field control (MFC) is an effective way to mitigate the curse of dimensionality of cooperative multi-agent reinforcement learning (MARL) problems. This work considers a collection of Npop heterogeneous agents that can be segregated into K classes such that the k-th class contains Nk homogeneous agents. We aim to prove approximation guarantees of the MARL problem for this heterogeneous system by its corresponding MFC problem. We consider three scenarios where the reward and transition dynamics of all agents are respectively taken to be functions of (1) joint state and action distributions across all classes, (2) individual distributions of each class, and (3) marginal distributions of the entire population. We show that, in these cases, the K-class MARL problem can be approximated by MFC with errors given as e1=O(|X|√+|U|√Npop∑kNk---√), e2=O([|X|---√+|U|---√]∑k1Nk√) and e3=O([|X|---√+|U|---√][ANpop∑k∈[K]Nk---√+BNpop√]), respectively, where A,B are some constants and |X|,|U| are the sizes of state and action spaces of each agent. Finally, we design a Natural Policy Gradient (NPG) based algorithm that, in the three cases stated above, can converge to an optimal MARL policy within O(ej) error with a sample complexity of O(e-3j), j∈-1,2,3}, respectively. [ABSTRACT FROM AUTHOR]
- Published
- 2022
45. The role of information structures in game-theoretic multi-agent learning.
- Author
-
Li, Tao, Zhao, Yuhan, and Zhu, Quanyan
- Subjects
- *
DATA structures , *REINFORCEMENT learning , *GAMIFICATION - Abstract
Multi-agent learning (MAL) studies how agents learn to behave optimally and adaptively from their experience when interacting with other agents in dynamic environments. The outcome of a MAL process is jointly determined by all agents' decision-making. Hence, each agent needs to think strategically about others' sequential moves, when planning future actions. The strategic interactions among agents makes MAL go beyond the direct extension of single-agent learning to multiple agents. With the strategic thinking, each agent aims to build a subjective model of others decision-making using its observations. Such modeling is directly influenced by agents' perception during the learning process, which is called the information structure of the agent's learning. As it determines the input to MAL processes, information structures play a significant role in the learning mechanisms of the agents. This review creates a taxonomy of MAL and establishes a unified and systematic way to understand MAL from the perspective of information structures. We define three fundamental components of MAL: the information structure (i.e., what the agent can observe), the belief generation (i.e., how the agent forms a belief about others based on the observations), as well as the policy generation (i.e., how the agent generates its policy based on its belief). In addition, this taxonomy enables the classification of a wide range of state-of-the-art algorithms into four categories based on the belief-generation mechanisms of the opponents, including stationary, conjectured, calibrated , and sophisticated opponents. We introduce Value of Information (VoI) as a metric to quantify the impact of different information structures on MAL. Finally, we discuss the strengths and limitations of algorithms from different categories and point to promising avenues of future research. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
46. Multi-Agent Learning-Based Nearly Non-Iterative Stochastic Dynamic Transactive Energy Control of Networked Microgrids.
- Author
-
Pan, Zhenning, Yu, Tao, Li, Jie, Wu, Yufeng, Chen, Junbin, Lu, Jidong, and Zhang, Xiaoshun
- Abstract
Coordination of networked microgrids (MGs) offers a promising solution to utilize distributed resources flexibilities and accommodate renewable energy. This paper studies real-time coordination of distribution system operation (DSO) and MGs considering multivariate uncertainty. Current researches suffer from inadaptability to dynamic system uncertainty, extensive iterations, and dependence on prediction. To fill these gaps, A novel multi-agent learning based stochastic dynamic programming (MASDP) is proposed to obtain the optimal policy for entities. Specifically, transactive energy control (TEC) mechanism, which requires only market-based information interactions, is employed to facilitate coordination between MG and DSO. A data-driven offline self-learning is proposed for entities to learn how to manage resources in response to system uncertainty. After sufficient offline learning, online operation of MASDP can be run in both non-iterative and iterative manners, by which near-optimal/optimal real-time solutions of DSO and MGs can be given sequentially and distributedly. Numerical comparisons with state-of-art policies and TEC algorithms verify the optimality, efficiency, adaptability, and scalability of MASDP. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
47. Multi‐agent reinforcement learning via knowledge transfer with differentially private noise.
- Author
-
Zishuo Cheng, Dayong Ye, Tianqing Zhu, Wanlei Zhou, Philip S. Yu, and Congcong Zhu
- Subjects
KNOWLEDGE transfer ,REINFORCEMENT learning ,TRANSFER of training ,LEARNING problems ,NOISE ,INSTRUCTIONAL systems - Abstract
In multi‐agent reinforcement learning, transfer learning is one of the key techniques used to speed up learning performance through the exchange of knowledge among agents. However, there are three challenges associated with applying this technique to real‐world problems. First, most real‐world domains are partially rather than fully observable. Second, it is difficult to pre‐collect knowledge in unknown domains. Third, negative transfer impedes the learning progress. We observe that differentially private mechanisms can overcome these challenges due to their randomization property. Therefore, we propose a novel differential transfer learning method for multi‐agent reinforcement learning problems, characterized by the following three key features. First, our method allows agents to implement real‐time knowledge transfers between each other in partially observable domains. Second, our method eliminates the constraints on the relevance of transferred knowledge, which expands the knowledge set to a large extent. Third, our method improves robustness to negative transfers by applying differentially exponential noise and relevance weights to transferred knowledge. The proposed method is the first to use the randomization property of differential privacy to stimulate the learning performance in multi‐agent reinforcement learning system. We further implement extensive experiments to demonstrate the effectiveness of our proposed method. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
48. Machine Learning Empowered Spectrum Sharing in Intelligent Unmanned Swarm Communication Systems: Challenges, Requirements and Solutions
- Author
-
Ximing Wang, Yuhua Xu, Chaohui Chen, Xiaoqin Yang, Jiaxin Chen, Lang Ruan, Yifan Xu, and Runfeng Chen
- Subjects
Unmanned swarm system ,spectrum sharing ,machine learning ,multi-agent learning ,game theory ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
The unmanned swarm system (USS) has been seen as a promising technology, and will play an extremely important role in both the military and civilian fields such as military strikes, disaster relief and transportation business. As the “nerve center” of USS, the unmanned swarm communication system (USCS) provides the necessary information transmission medium so as to ensure the system stability and mission implementation. However, challenges caused by multiple tasks, distributed collaboration, high dynamics, ultra-dense and jamming threat make it hard for USCS to manage limited spectrum resources. To tackle with such problems, the machine learning (ML) empowered intelligent spectrum management technique is introduced in this paper. First, based on the challenges of the spectrum resource management in USCS, the requirement of spectrum sharing is analyzed from the perspective of spectrum collaboration and spectrum confrontation. We found that suitable multi-agent collaborative decision making is promising to realize effective spectrum sharing in both two perspectives. Therefore, a multi-agent learning framework is proposed which contains mobile-computing-assisted and distributed structures. Based on the framework, we provide case studies. Finally, future research directions are discussed.
- Published
- 2020
- Full Text
- View/download PDF
49. Dynamic opponent modelling in two-player games
- Author
-
Mealing, Richard Andrew, Shapiro, Jonathan, and Brown, Gavin
- Subjects
006.3 ,decision-making ,imperfect information ,learning in games ,dynamic opponents ,opponent modelling ,sequence prediction ,change detection ,expectation-maximisation ,reinforcement learning ,lookahead ,best-response ,Nash equilibrium ,self-play convergence ,iterated normal-form games ,simplified poker ,multi-agent learning ,game theory - Abstract
This thesis investigates decision-making in two-player imperfect information games against opponents whose actions can affect our rewards, and whose strategies may be based on memories of interaction, or may be changing, or both. The focus is on modelling these dynamic opponents, and using the models to learn high-reward strategies. The main contributions of this work are: 1. An approach to learn high-reward strategies in small simultaneous-move games against these opponents. This is done by using a model of the opponent learnt from sequence prediction, with (possibly discounted) rewards learnt from reinforcement learning, to lookahead using explicit tree search. Empirical results show that this gains higher average rewards per game than state-of-the-art reinforcement learning agents in three simultaneous-move games. They also show that several sequence prediction methods model these opponents effectively, supporting the idea of using them from areas such as data compression and string matching; 2. An online expectation-maximisation algorithm that infers an agent's hidden information based on its behaviour in imperfect information games; 3. An approach to learn high-reward strategies in medium-size sequential-move poker games against these opponents. This is done by using a model of the opponent learnt from sequence prediction, which needs its hidden information (inferred by the online expectation-maximisation algorithm), to train a state-of-the-art no-regret learning algorithm by simulating games between the algorithm and the model. Empirical results show that this improves the no-regret learning algorithm's rewards when playing against popular and state-of-the-art algorithms in two simplified poker games; 4. Demonstrating that several change detection methods can effectively model changing categorical distributions with experimental results comparing their accuracies to empirical distributions. These results also show that their models can be used to outperform state-of-the-art reinforcement learning agents in two simultaneous-move games. This supports the idea of modelling changing opponent strategies with change detection methods; 5. Experimental results for the self-play convergence to mixed strategy Nash equilibria of the empirical distributions of plays of sequence prediction and change detection methods. The results show that they converge faster, and in more cases for change detection, than fictitious play.
- Published
- 2015
50. A Theoretical Framework for Large-Scale Human-Robot Interaction with Groups of Learning Agents.
- Author
-
Teh, Nicholas, Shuyue Hu, and Soh, Harold
- Subjects
HUMAN-robot interaction ,SOCIAL interaction ,OPEN-ended questions ,ROBOTS - Abstract
Recent advances in robot capabilities have led to a growing consensus that robots will eventually be deployed at scale across numerous application domains. An important open question is how humans and robots will adapt to one another over time. In this paper, we introduce the model-based Theoretical Human-Robot Scenarios (THuS) framework, capable of elucidating the interactions between large groups of humans and learning robots. We formally establish THuS, and consider its application to a human-robot variant of the-player coordination game, demonstrating the power of the theoretical framework as a tool to qualitatively understand and quantitatively compare HRI scenarios that involve different agent types. We also discuss the framework's limitations and potential. Our work provides the HRI community with a versatile tool that permits first-cut insights into large-scale HRI scenarios that are too costly or challenging to carry out in simulations or in the real-world. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.