Descriptor: "multi-agent learning" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"multi-agent learning"' showing total 293 results

Start Over Descriptor "multi-agent learning"

293 results on '"multi-agent learning"'

1. More Like Real World Game Challenge for Partially Observable Multi-agent Cooperation

Author: Feng, Xueou, Yao, Meng, Shen, Shengqi, Yin, Qiyue, Yang, Jun, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Lin, Zhouchen, editor, Cheng, Ming-Ming, editor, He, Ran, editor, Ubul, Kurban, editor, Silamu, Wushouer, editor, Zha, Hongbin, editor, Zhou, Jie, editor, and Liu, Cheng-Lin, editor
Published: 2025
Full Text: View/download PDF

2. Acquisition of Cooperative Control of Multiple Vehicles Through Reinforcement Learning Utilizing Vehicle-to-Vehicle Communication and Map Information.

Author: Suzuki, Tenta, Matsuda, Kenji, Kumagae, Kaito, Tobisawa, Mao, Hoshino, Junya, Itoh, Yuki, Harada, Tomohiro, Matsuoka, Jyouhei, Kagawa, Toshinori, and Hattori, Kiyohiko
Subjects: *MAPS, *INFORMATION resources management, *REINFORCEMENT learning
Abstract: In recent years, extensive research has been conducted on the practical applications of autonomous driving. Much of this research relies on existing road infrastructure and aims to replace and automate human drivers. Concurrently, studies on zero-based control optimization focus on the effective use of road resources without assuming the presence of car lanes. These studies often overlook the physical constraints of vehicles in their control optimization based on reinforcement learning, leading to the learning of unrealistic control behaviors while simplifying the implementation of ranging sensors and vehicle-to-vehicle communication. Additionally, these studies do not use map information, which is widely employed in autonomous driving research. To address these issues, we constructed a simulation environment that incorporates physics simulations, realistically implements ranging sensors and vehicle-to-vehicle communication, and actively employs map information. Using this environment, we evaluated the effect of vehicle-to-vehicle communication and map information on vehicle control learning. Our experimental results show that vehicle-to-vehicle communication reduces collisions, while the use of map information improves the average vehicle speed and reduces the average lap time. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

3. Enhanced Naive Agent in Angry Birds AI Competition via Exploitation-Oriented Learning.

Author: Miyazaki, Kazuteru
Subjects: *ARTIFICIAL intelligence, *REINFORCEMENT learning, *INTELLIGENCE officers, *GAME & game-birds, *CONTESTS, *PROFIT-sharing
Abstract: The Angry Birds AI Competition engages artificial intelligence agents in a contest based on the game Angry Birds. This tournament has been conducted annually since 2012, with participants competing for high scores. The organizers of this competition provide a basic agent, termed "Naive Agent," as a baseline indicator. This study enhanced the Naive Agent by integrating a profit-sharing approach known as exploitation-oriented learning, which is a type of experience-enhanced learning. The effectiveness of this method was substantiated through numerical experiments. Additionally, this study explored the use of level selection learning within a multi-agent environment and validated the utility of the rationality theorem concerning the indirect rewards in this environment. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

4. Staged Reinforcement Learning for Complex Tasks Through Decomposed Environments

Author: Pina, Rafael, Artaud, Corentin, Liu, Xiaolan, De Silva, Varuna, Filipe, Joaquim, Editorial Board Member, Ghosh, Ashish, Editorial Board Member, Prates, Raquel Oliveira, Editorial Board Member, Zhou, Lizhu, Editorial Board Member, Bennour, Akram, editor, Bouridane, Ahmed, editor, and Chaari, Lotfi, editor
Published: 2024
Full Text: View/download PDF

5. 多智能体博弈学习研究进展.

Author: 罗俊仁, 张万鹏, 苏炯铭, 袁唯淋, and 陈璟
Abstract: Copyright of Systems Engineering & Electronics is the property of Journal of Systems Engineering & Electronics Editorial Department and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Published: 2024
Full Text: View/download PDF

6. Deep-silicon photon-counting x-ray projection denoising through reinforcement learning.

Author: Tanveer, Md Sayed, Wiedeman, Christopher, Li, Mengzhou, Shi, Yongyi, De Man, Bruno, Maltz, Jonathan S., and Wang, Ge
Subjects: *DEEP reinforcement learning, *REINFORCEMENT learning, *PHOTON counting, *ARTIFICIAL intelligence, *X-rays, *SPATIAL resolution
Abstract: BACKGROUND: In recent years, deep reinforcement learning (RL) has been applied to various medical tasks and produced encouraging results. OBJECTIVE: In this paper, we demonstrate the feasibility of deep RL for denoising simulated deep-silicon photon-counting CT (PCCT) data in both full and interior scan modes. PCCT offers higher spatial and spectral resolution than conventional CT, requiring advanced denoising methods to suppress noise increase. METHODS: In this work, we apply a dueling double deep Q network (DDDQN) to denoise PCCT data for maximum contrast-to-noise ratio (CNR) and a multi-agent approach to handle data non-stationarity. RESULTS: Using our method, we obtained significant image quality improvement for single-channel scans and consistent improvement for all three channels of multichannel scans. For the single-channel interior scans, the PSNR (dB) and SSIM increased from 33.4078 and 0.9165 to 37.4167 and 0.9790 respectively. For the multichannel interior scans, the channel-wise PSNR (dB) increased from 31.2348, 30.7114, and 30.4667 to 31.6182, 30.9783, and 30.8427 respectively. Similarly, the SSIM improved from 0.9415, 0.9445, and 0.9336 to 0.9504, 0.9493, and 0.0326 respectively. CONCLUSIONS: Our results show that the RL approach improves image quality effectively, efficiently, and consistently across multiple spectral channels and has great potential in clinical applications. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

7. Hierarchical Reward Model of Deep Reinforcement Learning for Enhancing Cooperative Behavior in Automated Driving.

Author: Matsuda, Kenji, Suzuki, Tenta, Harada, Tomohiro, Matsuoka, Johei, Tobisawa, Mao, Hoshino, Jyunya, Itoh, Yuuki, Kumagae, Kaito, Kagawa, Toshinori, and Hattori, Kiyohiko
Subjects: *DEEP reinforcement learning, *REINFORCEMENT learning, *REWARD (Psychology), *GROUP work in education, *AUTOMOBILE driving, *SEARCHING behavior, *MOTOR vehicle driving
Abstract: In recent years, studies on practical application of automated driving have been conducted extensively. Most of the research assumes the existing road infrastructure and aims to replace human driving. There have also been studies that use reinforcement learning to optimize car control from a zero-based perspective in an environment without lanes, one of the existing types of road. In those studies, search and behavior acquisition using reinforcement learning has resulted in efficient driving control in an unknown environment. However, the throughput has not been high, while the crash rate has. To address this issue, this study proposes a hierarchical reward model that uses both individual and common rewards for reinforcement learning in order to achieve efficient driving control in a road, we assume environments of one-way, lane-less, automobile-only. Automated driving control is trained using a hierarchical reward model and evaluated through physical simulations. The results show that a reduction in crash rate and an improvement in throughput is attained by increasing the number of behaviors in which faster cars actively overtake slower ones. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

8. Deep Multi-Agent Reinforcement Learning for Decentralized Active Hypothesis Testing

Author: Hadar Szostak and Kobi Cohen
Subjects: Active hypothesis testing (AHT), controlled sensing for multihypothesis testing, decentralized inference, deep reinforcement learning (DRL), multi-agent learning, Electrical engineering. Electronics. Nuclear engineering, TK1-9971
Abstract: We consider a decentralized formulation of the active hypothesis testing (AHT) problem, where multiple agents gather noisy observations from the environment with the purpose of identifying the correct hypothesis. At each time step, agents have the option to select a sampling action. These different actions result in observations drawn from various distributions, each associated with a specific hypothesis. The agents collaborate to accomplish the task, where message exchanges between agents are allowed over a rate-limited communications channel. The objective is to devise a multi-agent policy that minimizes the Bayes risk. This risk comprises both the cost of sampling and the joint terminal cost incurred by the agents upon making a hypothesis declaration. Deriving optimal structured policies for AHT problems is generally mathematically intractable, even in the context of a single agent. As a result, recent efforts have turned to deep learning methodologies to address these problems, which have exhibited significant success in single-agent learning scenarios. In this paper, we tackle the multi-agent AHT formulation by introducing a novel algorithm rooted in the framework of deep multi-agent reinforcement learning. This algorithm, named Multi-Agent Reinforcement Learning for AHT (MARLA), operates at each time step by having each agent map its state to an action (sampling rule or stopping rule) using a trained deep neural network with the goal of minimizing the Bayes risk. We present a comprehensive set of experimental results that effectively showcase the agents’ ability to learn collaborative strategies and enhance performance using MARLA. Furthermore, we demonstrate the superiority of MARLA over single-agent learning approaches. Finally, we provide an open-source implementation of the MARLA framework, for the benefit of researchers and developers in related domains.
Published: 2024
Full Text: View/download PDF

9. Scaling Up Multi-Agent Reinforcement Learning: An Extensive Survey on Scalability Issues

Author: Dingbang Liu, Fenghui Ren, Jun Yan, Guoxin Su, Wen Gu, and Shohei Kato
Subjects: Multi-agent learning, reinforcement learning, scalability, collective learning, Electrical engineering. Electronics. Nuclear engineering, TK1-9971
Abstract: Multi-agent learning has made significant strides in recent years. Benefiting from deep learning, multi-agent deep reinforcement learning (MADRL) has transcended traditional limitations seen in tabular tasks, arousing tremendous research interest. However, compared to other challenges in MADRL, scalability remains underemphasized, impeding the application of MADRL in complex scenarios. Scalability stands as a foundational attribute of the multi-agent system (MAS), offering a potent approach to understand and improve collective learning among agents. It encompasses the capacity to handle the increasing state-action space which arises not only from a large number of agents but also from other factors related to agents and environment. In contrast to prior surveys, this work provides a comprehensive exposition of scalability concerns in MADRL. We first introduce foundational knowledge about deep reinforcement learning and MADRL to underscore the distinctiveness of scalability issues in this domain. Subsequently, we delve into the problems posed by scalability, examining agent complexity, environment complexity, and robustness against perturbation. We elaborate on the methods that demonstrate the evolution of scalable algorithms. To conclude this survey, we discuss challenges, identify trends, and outline possible directions for future work on scalability issues. It is our aspiration that this survey enhances the understanding of researchers in this field, providing a valuable resource for in-depth exploration.
Published: 2024
Full Text: View/download PDF

10. A New Distributed Architecture Based on Reinforcement Learning for Parameter Estimation in Image Processing

Author: Qaffou, Issam, Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, Ezziyyani, Mostafa, editor, and Balas, Valentina Emilia, editor
Published: 2023
Full Text: View/download PDF

11. On Implementing a Simulation Environment for a Cooperative Multi-agent Learning Approach to Mitigate DRDoS Attacks

Author: Kawazoe, Tomoki, Fukuta, Naoki, Kacprzyk, Janusz, Series Editor, Hadfi, Rafik, editor, Aydoğan, Reyhan, editor, Ito, Takayuki, editor, and Arisaka, Ryuta, editor
Published: 2023
Full Text: View/download PDF

12. Learning in the Presence of Multiple Agents

Author: Ramponi, Giorgia and Riva, Carlo G., editor
Published: 2023
Full Text: View/download PDF

13. Cooperative Multi-Agent Nash Q-Learning (CMNQL) for Decision Building in Retail Shop

Author: Vidhate, Deepak A., Kulkarni, Parag, Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, Kaiser, M. Shamim, editor, Xie, Juanying, editor, and Rathore, Vijay Singh, editor
Published: 2023
Full Text: View/download PDF

14. Emergent cooperation from mutual acknowledgment exchange in multi-agent reinforcement learning

Author: Phan, Thomy, Sommer, Felix, Ritz, Fabian, Altmann, Philipp, Nüßlein, Jonas, Kölle, Michael, Belzner, Lenz, and Linnhoff-Popien, Claudia
Published: 2024
Full Text: View/download PDF

15. A deep reinforcement learning strategy for autonomous robot flocking.

Author: Martínez, Fredy, Montiel, Holman, and Wanumen, Luis
Subjects: DEEP reinforcement learning, AUTONOMOUS robots, REINFORCEMENT learning, ANIMAL social behavior, MULTIAGENT systems, INTELLIGENCE levels
Abstract: Social behaviors in animals such as bees, ants, and birds have shown high levels of intelligence from a multi-agent system perspective. They present viable solutions to real-world problems, particularly in navigating constrained environments with simple robotic platforms. Among these behaviors is swarm flocking, which has been extensively studied for this purpose. Flocking algorithms have been developed from basic behavioral rules, which often require parameter tuning for specific applications. However, the lack of a general formulation for tuning has made these strategies difficult to implement in various real conditions, and even to replicate laboratory behaviors. In this paper, we propose a flocking scheme for small autonomous robots that can self-learn in dynamic environments, derived from a deep reinforcement learning process. Our approach achieves flocking independently of population size and environmental characteristics, with minimal external intervention. Our multi-agent system model considers each agent’s action as a linear function dynamically adjusting the motion according to interactions with other agents and the environment. Our strategy is an important contribution toward real-world flocking implementation. We demonstrate that our approach allows for autonomous flocking in the system without requiring specific parameter tuning, making it ideal for applications where there [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

16. Fighting for Routes: Resource Allocation among Competing Planners in Transportation Networks.

Author: Roman, Charlotte and Turrini, Paolo
Subjects: *RESOURCE allocation, *PLANNERS, *COST functions, *SOCIAL services, *PRICES, *REINFORCEMENT learning, *ROUTE choice
Abstract: In transportation networks, incomplete information is ubiquitous, and users often delegate their route choice to distributed route planners. To model and study these systems, we introduce network control games, consisting of multiple actors seeking to optimise the social welfare of their assigned subpopulations through resource allocation in an underlying nonatomic congestion game. We first analyse the inefficiency of the routing equilibria by calculating the Price of Anarchy for polynomial cost functions, and then, using an Asynchronous Advantage Actor–Critic algorithm implementation, we show that reinforcement learning agents are vulnerable to choosing suboptimal routing as predicted by the theory. Finally, we extend the analysis to allow vehicles to choose their route planner and study the associated equilibria. Our results can be applied to mitigate inefficiency issues arising in large transport networks with route controlled autonomous vehicles. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

17. Causal inference multi-agent reinforcement learning for traffic signal control.

Author: Yang, Shantian, Yang, Bo, Zeng, Zheng, and Kang, Zhongfeng
Subjects: *TRAFFIC signs & signals, *REINFORCEMENT learning, *TRAFFIC engineering, *CAUSAL inference
Abstract: • A Causal-Inference (CI) model is designed for the non-stationary multi-agent environment. • Combining with Multi-Agent learning, a CI-MA algorithm is proposed for traffic signal control. • Different granularity of traffic information is fused for feature representation. • A representation loss function and MA loss function are designed for joint optimization. • Experiments show that CI-MA algorithm outperforms the state-of-the-art algorithms. A primary challenge in multi-agent reinforcement learning for traffic signal control is to produce effective cooperative traffic-signal policies in non-stationary multi-agent traffic environments. However, each agent suffers from its local non-stationary traffic environment caused by the time-varying traffic-signal policies of adjacent agents; At the same time, different agents also produce time-varying traffic-signal policies, which further results in the non-stationarity of the whole traffic environment, so these produced traffic-signal policies may be ineffective. In this work, we propose a Causal Inference Multi-Agent reinforcement learning (CI-MA) algorithm, which can alleviate the non-stationarity of multi-agent traffic environments from both feature representation and optimization, eventually helps to produce effective cooperative traffic-signal policies. Specifically, a Causal-Inference (CI) model is first designed to reason about and tackle the non-stationarity of multi-agent traffic environments by both acquiring feature representation distributions and deriving variational lower bounds (i.e., objective functions); And then, based on the designed CI model, we propose a CI-MA algorithm, in which the feature representations are acquired from the non-stationarity of multi-agent traffic environments at both task level and timestep level, the acquired feature representations are used to produce cooperative traffic-signal policies and Q-values for multiple agents; Finally the corresponding objective functions optimize the whole algorithm from both causal inference and multi-agent reinforcement learning. Experiments are conducted in different non-stationary multi-agent traffic environments. Results show that CI-MA algorithm outperforms other state-of-the-art algorithms, and demonstrate that the proposed algorithm trained in synthetic-traffic environments can be effectively transferred to both synthetic- and real-traffic environments with non-stationarity. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

18. Multi-agent adaptive routing by multi-head-attention-based twin agents using reinforcement learning

Author: Timofey A. Gribanov, Andrey A. Filchenkov, Artur A. Azarov, and Anatoly A. Shalyto
Subjects: routing, multi-agent learning, reinforcement learning, adaptive routing, Optics. Light, QC350-467, Electronic computers. Computer science, QA75.5-76.95
Abstract: A regular condition, typical for packet routing, for the problem of cargo transportation, and for the problem of flow control, is the variability of the graph. Reinforcement learning based adaptive routing algorithms are designed to solve the routing problem with this condition. However, with significant changes in the graph, the existing routing algorithms require complete retraining. To handle this challenge, we propose a novel method based on multi-agent modeling with twin-agents for which new neural network architecture with multi-headed internal attention is proposed, pre-trained within the framework of the multi-view learning paradigm. An agent in such a paradigm uses a vertex as an input, twins of the main agent are placed at the vertices of the graph and select a neighbor to which the object should be transferred. We carried out a comparative analysis with the existing DQN-LE-routing multi-agent routing algorithm on two stages: pre-training and simulation. In both cases, launches were considered by changing the topology during testing or simulation. Experiments have shown that the proposed adaptability enhancement method provides global adaptability by increasing delivery time only by 14.5 % after global changes occur. The proposed method can be used to solve routing problems with complex path evaluation functions and dynamically changing graph topologies, for example, in transport logistics and for managing conveyor belts in production.
Published: 2022
Full Text: View/download PDF

19. Joint learning of agents and graph embeddings in a conveyor belt control problem

Author: Konstantin E. Rybkin, Andrey A. Filchenkov, Artur A. Azarov, Alexey S. Zabashta, and Anatoly A. Shalyto
Subjects: multi-agent learning, reinforcement learning, adaptive routing, conveyor belt, graph representation, Optics. Light, QC350-467, Electronic computers. Computer science, QA75.5-76.95
Abstract: We focus on the problem of routing a conveyor belts system based on a multi-agent approach. Most of these airport baggage belt conveyor systems use routing algorithms based on manual simulation of conveyor behavior. This approach does not scale well, and new research in machine learning proposes to solve the routing problem using reinforcement learning. To solve this problem, we propose an approach to joint learning of agents and vector representations of a graph. Within this approach, we develop a QSDNE algorithm, which uses DQN agents and SDNE embeddings. A comparative analysis was carried out with multi-agent routing algorithms without joint learning. The results of the QSDNE algorithm showed its effectiveness in optimizing the delivery time and energy consumption in conveyor systems as it helped to reduce mean delivery time by 6 %. The proposed approach can be used to solve routing problems with complex path estimation functions and dynamically changing graph topologies, and the proposed algorithm can be used to control conveyor belts at airports and in manufacturing workshops.
Published: 2022
Full Text: View/download PDF

20. Signal Instructed Coordination in Cooperative Multi-agent Reinforcement Learning

Author: Chen, Liheng, Guo, Hongyi, Du, Yali, Fang, Fei, Zhang, Haifeng, Zhang, Weinan, Yu, Yong, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Chen, Jie, editor, Lang, Jérôme, editor, Amato, Christopher, editor, and Zhao, Dengji, editor
Published: 2022
Full Text: View/download PDF

21. Deep Reinforcement Ant Colony Optimization for Swarm Learning

Author: Bolshakov, Vladislav, Alfimtsev, Alexander, Sakulin, Sergey, Bykov, Nikita, Kacprzyk, Janusz, Series Editor, Kryzhanovsky, Boris, editor, Dunin-Barkowski, Witali, editor, Redko, Vladimir, editor, Tiumentsev, Yury, editor, and Klimov, Valentin V., editor
Published: 2022
Full Text: View/download PDF

22. Multi-Agent Deep Reinforcement Learning for Multi-Robot Applications: A Survey.

Author: Orr, James and Dutta, Ayan
Subjects: *DEEP reinforcement learning, *MEDICAL care, *ROBOTICS
Abstract: Deep reinforcement learning has produced many success stories in recent years. Some example fields in which these successes have taken place include mathematics, games, health care, and robotics. In this paper, we are especially interested in multi-agent deep reinforcement learning, where multiple agents present in the environment not only learn from their own experiences but also from each other and its applications in multi-robot systems. In many real-world scenarios, one robot might not be enough to complete the given task on its own, and, therefore, we might need to deploy multiple robots who work together towards a common global objective of finishing the task. Although multi-agent deep reinforcement learning and its applications in multi-robot systems are of tremendous significance from theoretical and applied standpoints, the latest survey in this domain dates to 2004 albeit for traditional learning applications as deep reinforcement learning was not invented. We classify the reviewed papers in our survey primarily based on their multi-robot applications. Our survey also discusses a few challenges that the current research in this domain faces and provides a potential list of future applications involving multi-robot systems that can benefit from advances in multi-agent deep reinforcement learning. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

23. Learning to play against any mixture of opponents

Author: Max Olan Smith, Thomas Anthony, and Michael P. Wellman
Subjects: reinforcement learning (RL), transfer learning (TL), deep reinforcement learning (deep RL), value-based reinforcement learning, multi-agent learning, Electronic computers. Computer science, QA75.5-76.95
Abstract: Intuitively, experience playing against one mixture of opponents in a given domain should be relevant for a different mixture in the same domain. If the mixture changes, ideally we would not have to train from scratch, but rather could transfer what we have learned to construct a policy to play against the new mixture. We propose a transfer learning method, Q-Mixing, that starts by learning Q-values against each pure-strategy opponent. Then a Q-value for any distribution of opponent strategies is approximated by appropriately averaging the separately learned Q-values. From these components, we construct policies against all opponent mixtures without any further training. We empirically validate Q-Mixing in two environments: a simple grid-world soccer environment, and a social dilemma game. Our experiments find that Q-Mixing can successfully transfer knowledge across any mixture of opponents. Next, we consider the use of observations during play to update the believed distribution of opponents. We introduce an opponent policy classifier—trained reusing Q-learning data—and use the classifier results to refine the mixing of Q-values. Q-Mixing augmented with the opponent policy classifier performs better, with higher variance, than training directly against a mixed-strategy opponent.
Published: 2023
Full Text: View/download PDF

24. Behavior analysis of emergent rule discovery for cooperative automated driving using deep reinforcement learning.

Author: Harada, Tomohiro, Matsuoka, Johei, and Hattori, Kiyohiko
Abstract: With the improvements in AI technology and sensor performance, research on automated driving has become increasingly popular. However, most studies are based on human driving styles. In this study, we consider an environment in which only autonomous vehicles are present. In such an environment, it is essential to develop an appropriate control method that actively utilizes the characteristics of autonomous vehicles, such as dense information exchange and highly accurate vehicle control. To address this issue, we investigated the emergence of automatic driving rules using reinforcement learning based on information from surrounding vehicles using inter-vehicle communication. We evaluated whether reinforcement learning converges in a situation where distance sensor information can be shared in real-time using vehicle-to-vehicle communication and whether reinforcement learning can learn a rational driving method. The simulation results show a positive trend in the cumulative rewards value, and it indicates that the proposed multi-agent learning method with an extended own-vehicle environment has the potential to learn automated vehicle control with cooperative behavior automatically. Furthermore, we analyzed whether a rational driving method (action selection) can be learned by reinforcement learning. The simulation results showed that reinforcement learning achieves rational control of the overtaking behavior. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

25. MALib: A Parallel Framework for Population-based Multi-agent Reinforcement Learning.

Author: Ming Zhou, Ziyu Wan, Hanjing Wang, Muning Wen, Runzhe Wu, Ying Wen, Yaodong Yang, Yong Yu, Jun Wang, and Weinan Zhang
Subjects: *MACHINE learning, *REINFORCEMENT learning, *MARL
Abstract: Population-based multi-agent reinforcement learning (PB-MARL) encompasses a range of methods that merge dynamic population selection with multi-agent reinforcement learning algorithms (MARL). While PB-MARL has demonstrated notable achievements in complex multi-agent tasks, its sequential execution is plagued by low computational efficiency due to the diversity in computing patterns and policy combinations. We propose a solution involving a stateless central task dispatcher and stateful workers to handle PB-MARL's subroutines, thereby capitalizing on parallelism across various components for efficient problem-solving. In line with this approach, we introduce MALib, a parallel framework that incorporates a task control model, independent data servers, and an abstraction of MARL training paradigms. The framework has undergone extensive testing and is available under the MIT license (https://github.com/sjtu-marl/malib). [ABSTRACT FROM AUTHOR]
Published: 2023

26. Many-to-Many Data Aggregation Scheduling Based on Multi-Agent Learning for Multi-Channel WSN.

Author: Lu, Yao, Wang, Keweiqi, and He, Erbao
Subjects: DISTRIBUTED algorithms, MULTICHANNEL communication, WIRELESS sensor networks, WIRELESS channels, SCHEDULING, MULTIAGENT systems, WIRELESS communications
Abstract: Many-to-many data aggregation has become an indispensable technique to realize the simultaneous executions of multiple applications with less data traffic load and less energy consumption in a multi-channel WSN (wireless sensor network). The problem of how to efficiently allocate time slot and channel for each node is one of the most critical problems for many-to-many data aggregation in multi-channel WSNs, and this problem can be solved with the new distributed scheduling method without communication conflict outlined in this paper. The many-to-many data aggregation scheduling process is abstracted as a decentralized partially observable Markov decision model in a multi-agent system. In the case of embedding cooperative multi-agent learning technology, sensor nodes with group observability work in a distributed manner. These nodes cooperated and exploit local feedback information to automatically learn the optimal scheduling strategy, then select the best time slot and channel for wireless communication. Simulation results show that the new scheduling method has advantages in performance when comparing with the existing methods. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

27. Joint Trajectory and Handover Management for UAVs Co-existing with Terrestrial Users : Deep Reinforcement Learning Based Approaches

Author: Deng, Yuhang and Deng, Yuhang
Abstract: Integrating unmanned aerial vehicles (UAVs) as aerial user equipments (UEs) into cellular networks is now considered as a promising solution to provide extensive wireless connectivity for supporting UAV-centric commercial or civilian applications. However, the co-existence of UAVs with conventional terrestrial UEs is one of the primary challenges for this solution. Flying at higher altitudes with maneuverability advantage, UAVs are able to establish line-of-sight (LoS) connectivity with more base stations (BSs) than terrestrial UEs. Although LoS connectivity reduces the communication delay of UAVs, they also simultaneously increase the interference that UAVs cause to terrestrial UEs. In scenarios involving multiple UAVs, LoS connectivity can even lead to interference issues among themselves. In addition, LoS connectivity leads to extensive overlapping coverage areas of multiple BSs for UAVs, forcing them to perform frequent handovers during the flight if the received signal strength (RSS)-based handover policy is employed. The trajectories and BS associations of UAVs, along with their radio resource allocation are essential design parameters aimed at enabling their seamless integration into cellular networks, with a particular focus on managing interference levels they generate and reducing the redundant handovers they performe. Hence, this thesis designs two joint trajectory and handover management approaches for single-UAV and multi-UAVs scenarios, respectively, aiming to minimize the weighted sum of three key performance indicators (KPIs): transmission delay, up-link interference, and handover numbers. The approaches are based on deep reinforcement learning (DRL) frameworks with dueling double deep Q-network (D3QN) and Q-learning with a MIXer network (QMIX) algorithms being selected as the training agents, respectively. The choice of these DRL algorithms is motivated by their capability in designing sequential decision-making policies consisting of trajectory des, Att integrera obemannade flygfordon (UAV) som flyganvändarutrustning (UE) i cellulära nätverk anses nu vara en lovande lösning för att tillhandahålla omfattande trådlös anslutning för att stödja UAV-centrerade kommersiella eller civila tillämpningar. Men samexistensen av UAV med konventionella markbundna UE är en av de främsta utmaningarna för denna lösning. Flygande på högre höjder med manövrerbarhetsfördelar kan UAV:er etablera siktlinje (LoS)-anslutning med fler basstationer (BS) än markbundna UE. Även om LoS-anslutning minskar kommunikationsfördröjningen för UAV:er, ökar de samtidigt störningen som UAV:er orsakar för markbundna UE. I scenarier som involverar flera UAV:er kan LoS-anslutning till och med leda till störningsproblem sinsemellan. Dessutom leder LoS-anslutning till omfattande överlappande täckningsområden för flera BS:er för UAV, vilket tvingar dem att utföra frekventa överlämningar under flygningen om den mottagna signalstyrkan (RSS)-baserad överlämningspolicy används. UAV:s banor och BS-associationer, tillsammans med deras radioresursallokering, är väsentliga designparametrar som syftar till att möjliggöra deras sömlösa integrering i cellulära nätverk, med särskilt fokus på att hantera störningsnivåer de genererar och minska de redundanta handovers de utför. Därför designar denna avhandling två gemensamma bana och handover-hanteringsmetoder för en-UAV-respektive multi-UAV-scenarier, som syftar till att minimera den viktade summan av tre nyckelprestandaindikatorer (KPI:er): överföringsfördröjning, upplänksinterferens och överlämningsnummer . Tillvägagångssätten är baserade på ramverk för djup förstärkning inlärning (DRL) med duellerande dubbla djupa Q-nätverk (D3QN) och Q-lärande med ett MIXer-nätverk (QMIX) algoritmer som väljs som träningsagenter. Valet av dessa DRL-algoritmer motiveras av deras förmåga att utforma sekventiella beslutsfattande policyer som består av banadesign och handover-hantering. Resultaten visar att de föreslagna tillvägagångs
Published: 2024

28. Multi-agent Service Area Adaptation for Ride-Sharing Using Deep Reinforcement Learning

Author: Yoshida, Naoki, Noda, Itsuki, Sugawara, Toshiharu, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Demazeau, Yves, editor, Holvoet, Tom, editor, Corchado, Juan M., editor, and Costantini, Stefania, editor
Published: 2020
Full Text: View/download PDF

29. Entropy regularized actor-critic based multi-agent deep reinforcement learning for stochastic games.

Author: Hao, Dong, Zhang, Dongcheng, Shi, Qi, and Li, Kai
Subjects: *REINFORCEMENT learning, *MAXIMUM entropy method, *EDUCATIONAL games, *ENTROPY, *LAGRANGE equations, *MARL
Abstract: Multi-agent reinforcement learning (MARL) is an abstract framework modeling a dynamic environment that involves multiple learning and decision-making agents, each of which tries to maximize her cumulative reward. In MARL, each agent discovers a strategy alongside others and adapts her policy in response to the behavioural changes of others. A fundamental difficulty faced by MARL is that every agent is dynamically learning and changing to improve her reward, making the whole system unstable and agents' policies difficult to converge. In this paper, we introduce the entropy regularizer into the Bellman equation and utilize Lagrange approach to optimize the entropy regularizer. We then propose a MARL algorithm based on the maximum entropy principle and the actor-critic method. This algorithm follows the policy gradient approach and uses a policy network and a value network. We call it Multi-Agent Deep Soft Policy Gradient (MADSPG). Then by using the Lagrange approach and dynamic minimax optimization, we propose the AUTO-MADSPG algorithm with an automatically adjusted entropy regularizer. These algorithms make multi-agent learning more stable while sufficient exploration is guaranteed. Finally, we also incorporate MADSPG with the recently proposed opponent modeling component into an integrated framework. This framework outperforms many state-of-the-art MARL algorithms in conventional cooperative and competitive game settings. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

30. Techniques and Paradigms in Modern Game AI Systems.

Author: Lu, Yunlong and Li, Wenxin
Subjects: *REINFORCEMENT learning, *VIDEO game culture
Abstract: Games have long been benchmarks and test-beds for AI algorithms. With the development of AI techniques and the boost of computational power, modern game AI systems have achieved superhuman performance in many games played by humans. These games have various features and present different challenges to AI research, so the algorithms used in each of these AI systems vary. This survey aims to give a systematic review of the techniques and paradigms used in modern game AI systems. By decomposing each of the recent milestones into basic components and comparing them based on the features of games, we summarize the common paradigms to build game AI systems and their scope and limitations. We claim that deep reinforcement learning is the most general methodology to become a mainstream method for games with higher complexity. We hope this survey can both provide a review of game AI algorithms and bring inspiration to the game AI community for future directions. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

31. META: A City-Wide Taxi Repositioning Framework Based on Multi-Agent Reinforcement Learning.

Author: Liu, Chenxi, Chen, Chao-Xiong, and Chen, Chao
Abstract: The popularity of online ride-hailing platforms has made people travel smarter than ever before. But people still frequently encounter the dilemma of “taxi drivers hunt passengers and passengers search for unoccupied taxis”. Many studies try to reposition idle taxis to alleviate such issues by using reinforcement learning based methods, as they are capable of capturing future demand/supply dynamics. However, they either coordinate all city-wide taxis in a centralized manner or treat all taxis in a region homogeneously, resulting in inefficient or inaccurate learning performance. In this paper, we propose a multi-agent reinforcement learning based framework named META (MakE Taxi Act differently in each agent) to mitigate the disequilibrium of supply and demand via repositioning taxis at the city scale. We decompose it into two subproblems, i.e., taxi demand/supply determination and taxi dispatching strategy formulation. Two components are wisely built in META to address the gap collaboratively, in which each region is regarded as an agent and taxis inside the region can make two different actions. Extensive experiments demonstrate that META outperforms existing methods. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

32. A multi‐agent based mechanism for collaboratively detecting distributed denial of service attacks in internet of vehicles.

Author: Dong, Tingting, Chen, Lei, Zhou, Li, Xue, Fei, and Qin, Huilin
Subjects: DENIAL of service attacks, HIDDEN Markov models, VITERBI decoding, REINFORCEMENT learning, BOTNETS, INTRUSION detection systems (Computer security)
Abstract: Distributed denial of service (DDoS) attacks have become a hidden danger in the development of the internet of vehicles (IoV). DDoS attacks for TCP protocol are studied to improve the information security environment of IoV. For the distribution characteristics of DDoS attacks, an information sharing and collaborative detection mechanism based on multi‐agent is proposed. Considering the relationship between the features of adjacent moments in the TCP communication, the DDoS detection model based on hidden Markov model is built, and the Viterbi algorithm is improved for the problem of the false alarm in the observation sequence. The optimal communication strategy among agents is determined by deep reinforcement learning, and fusion algorithm is designed to improve the current strategy of agents. Three groups of comparative experiments are designed and analyzed. The simulation results show that proposed algorithms are effective. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

33. Fighting for Routes: Resource Allocation among Competing Planners in Transportation Networks

Author: Charlotte Roman and Paolo Turrini
Subjects: resource allocation, congestion games, multi-agent learning, efficiency, Technology, Social Sciences
Abstract: In transportation networks, incomplete information is ubiquitous, and users often delegate their route choice to distributed route planners. To model and study these systems, we introduce network control games, consisting of multiple actors seeking to optimise the social welfare of their assigned subpopulations through resource allocation in an underlying nonatomic congestion game. We first analyse the inefficiency of the routing equilibria by calculating the Price of Anarchy for polynomial cost functions, and then, using an Asynchronous Advantage Actor–Critic algorithm implementation, we show that reinforcement learning agents are vulnerable to choosing suboptimal routing as predicted by the theory. Finally, we extend the analysis to allow vehicles to choose their route planner and study the associated equilibria. Our results can be applied to mitigate inefficiency issues arising in large transport networks with route controlled autonomous vehicles.
Published: 2023
Full Text: View/download PDF

34. Multi-Agent Deep Reinforcement Learning for Multi-Robot Applications: A Survey

Author: James Orr and Ayan Dutta
Subjects: deep reinforcement learning, multi-robot systems, multi-agent learning, survey, Chemical technology, TP1-1185
Abstract: Deep reinforcement learning has produced many success stories in recent years. Some example fields in which these successes have taken place include mathematics, games, health care, and robotics. In this paper, we are especially interested in multi-agent deep reinforcement learning, where multiple agents present in the environment not only learn from their own experiences but also from each other and its applications in multi-robot systems. In many real-world scenarios, one robot might not be enough to complete the given task on its own, and, therefore, we might need to deploy multiple robots who work together towards a common global objective of finishing the task. Although multi-agent deep reinforcement learning and its applications in multi-robot systems are of tremendous significance from theoretical and applied standpoints, the latest survey in this domain dates to 2004 albeit for traditional learning applications as deep reinforcement learning was not invented. We classify the reviewed papers in our survey primarily based on their multi-robot applications. Our survey also discusses a few challenges that the current research in this domain faces and provides a potential list of future applications involving multi-robot systems that can benefit from advances in multi-agent deep reinforcement learning.
Published: 2023
Full Text: View/download PDF

35. CONTINUOUS-TIME CONVERGENCE RATES IN POTENTIAL AND MONOTONE GAMES.

Author: BOLIN GAO and PAVEL, LACRA
Subjects: *NASH equilibrium, *POTENTIAL functions, *TELEVISION game programs, *GAMES, *EQUILIBRIUM
Abstract: In this paper, we provide exponential rates of convergence to the interior Nash equilibrium for continuous-time dual-space game dynamics such as mirror descent (MD) and actorcritic (AC). We perform our analysis in N-player continuous concave games that satisfy certain monotonicity assumptions while possibly also admitting potential functions. In the first part of this paper, we provide a novel relative characterization of monotone games and show that MD and its discounted version converge with\scrO (e\beta t) in relatively strongly and relatively hypomonotone games, respectively. In the second part of this paper, we specialize our results to games that admit a relatively strongly concave potential and show that AC converges with\scrO (e\beta t). These rates extend their known convergence conditions. Simulations are performed which empirically back up our results. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

36. A Framework for Dynamic Decision Making by Multi-agent Cooperative Fault Pair Algorithm (MCFPA) in Retail Shop Application

Author: Vidhate, Deepak A., Kulkarni, Parag, Howlett, Robert James, Series Editor, Jain, Lakhmi C., Series Editor, Satapathy, Suresh Chandra, editor, and Joshi, Amit, editor
Published: 2019
Full Text: View/download PDF

37. Impact of Neighboring Agent’s Characteristics with Q-Learning in Network Multi-agent System

Author: Kaur, Harjot, Devi, Ginni, Barbosa, Simone Diniz Junqueira, Series Editor, Filipe, Joaquim, Series Editor, Kotenko, Igor, Series Editor, Sivalingam, Krishna M., Series Editor, Washio, Takashi, Series Editor, Yuan, Junsong, Series Editor, Zhou, Lizhu, Series Editor, Ghosh, Ashish, Series Editor, Luhach, Ashish Kumar, editor, Singh, Dharm, editor, Hsiung, Pao-Ann, editor, Hawari, Kamarul Bin Ghazali, editor, Lingras, Pawan, editor, and Singh, Pradeep Kumar, editor
Published: 2019
Full Text: View/download PDF

38. Multi-agent Hierarchical Reinforcement Learning with Dynamic Termination

Author: Han, Dongge, Böhmer, Wendelin, Wooldridge, Michael, Rogers, Alex, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Nayak, Abhaya C., editor, and Sharma, Alok, editor
Published: 2019
Full Text: View/download PDF

39. User-Centric Radio Access Technology Selection: A Survey of Game Theory Models and Multi-Agent Learning Algorithms

Author: Giuseppe Caso, Ozgu Alay, Guido Carlo Ferrante, Luca De Nardis, Maria-Gabriella Di Benedetto, and Anna Brunstrom
Subjects: Radio access technology selection, game theory, multi-agent learning, reinforcement learning, Electrical engineering. Electronics. Nuclear engineering, TK1-9971
Abstract: User-centric radio access technology (RAT) selection is a key communication paradigm, given the increased number of available RATs and increased cognitive capabilities at the user end. When considered against traditional network-centric approaches, user-centric RAT selection results in reduced network-side management load, and leads to lower operational costs for RATs, as well as improved quality of service (QoS) and quality of experience (QoE) for users. The complex between-users interactions involved in RAT selection require, however, specific analyses, toward developing reliable and efficient schemes. Two theoretical frameworks are most often applied to user-centric RAT selection analysis, i.e., game theory (GT) and multi-agent learning (MAL). As a consequence, several GT models and MAL algorithms have been recently proposed to solve the problem at hand. A comprehensive discussion of such models and algorithms is, however, currently missing. Moreover, novel issues introduced by next-generation communication systems also need to be addressed. This paper proposes to fill the above gaps by providing a unified reference for both ongoing research and future research directions in the field. In particular, the review addresses the most common GT and MAL models and algorithms, and scenario settings adopted in user-centric RAT selection in terms of utility function and network topology. Regarding GT, the review focuses on non-cooperative models, because of their widespread use in RAT selection; as for MAL, a large number of algorithms are described, ranging from game-theoretic to reinforcement learning (RL) schemes, and also including most recent approaches, such as deep RL (DRL) and multi-armed bandit (MAB). Models and algorithms are analyzed by comparatively reviewing relevant literature. Finally, open challenges are discussed, in light of ongoing research and standardization activities.
Published: 2021
Full Text: View/download PDF

40. Reward-based epigenetic learning algorithm for a decentralised multi-agent system

Author: Mukhlish, Faqihza, Page, John, and Bain, Michael
Published: 2020
Full Text: View/download PDF

41. Dynamical systems as a level of cognitive analysis of multi-agent learning: Algorithmic foundations of temporal-difference learning dynamics.

Author: Barfuss, Wolfram
Subjects: *DYNAMICAL systems, *COGNITIVE analysis, *REINFORCEMENT learning, *GAME theory, *MULTIAGENT systems
Abstract: A dynamical systems perspective on multi-agent learning, based on the link between evolutionary game theory and reinforcement learning, provides an improved, qualitative understanding of the emerging collective learning dynamics. However, confusion exists with respect to how this dynamical systems account of multi-agent learning should be interpreted. In this article, I propose to embed the dynamical systems description of multi-agent learning into different abstraction levels of cognitive analysis. The purpose of this work is to make the connections between these levels explicit in order to gain improved insight into multi-agent learning. I demonstrate the usefulness of this framework with the general and widespread class of temporal-difference reinforcement learning. I find that its deterministic dynamical systems description follows a minimum free-energy principle and unifies a boundedly rational account of game theory with decision-making under uncertainty. I then propose an on-line sample-batch temporal-difference algorithm which is characterized by the combination of applying a memory-batch and separated state-action value estimation. I find that this algorithm serves as a micro-foundation of the deterministic learning equations by showing that its learning trajectories approach the ones of the deterministic learning equations under large batch sizes. Ultimately, this framework of embedding a dynamical systems description into different abstraction levels gives guidance on how to unleash the full potential of the dynamical systems approach to multi-agent learning. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

42. Multi-agent deep reinforcement learning: a survey.

Author: Gronauer, Sven and Diepold, Klaus
Subjects: DEEP learning, REINFORCEMENT learning, MACHINE learning
Abstract: The advances in reinforcement learning have recorded sublime success in various domains. Although the multi-agent domain has been overshadowed by its single-agent counterpart during this progress, multi-agent reinforcement learning gains rapid traction, and the latest accomplishments address problems with real-world complexity. This article provides an overview of the current developments in the field of multi-agent deep reinforcement learning. We focus primarily on literature from recent years that combines deep reinforcement learning methods with a multi-agent scenario. To survey the works that constitute the contemporary landscape, the main contents are divided into three parts. First, we analyze the structure of training schemes that are applied to train multiple agents. Second, we consider the emergent patterns of agent behavior in cooperative, competitive and mixed scenarios. Third, we systematically enumerate challenges that exclusively arise in the multi-agent domain and review methods that are leveraged to cope with these challenges. To conclude this survey, we discuss advances, identify trends, and outline possible directions for future work in this research area. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

43. Spatial Positioning Token (SPToken) for Smart Mobility.

Author: Overko, Roman, Ordonez-Hurtado, Rodrigo, Zhuk, Sergiy, Ferraro, Pietro, Cullen, Andrew, and Shorten, Robert
Abstract: We introduce a permissioned distributed ledger technology (DLT) design for crowdsourced smart mobility applications. This architecture is based on a directed acyclic graph architecture (similar to the IOTA tangle) and uses both Proof-of-Work and Proof-of-Position mechanisms to provide protection against spam attacks and malevolent actors. In addition to enabling individuals to retain ownership of their data and to monetize it, the architecture is also suitable for distributed privacy-preserving machine learning algorithms, is lightweight, and can be implemented in simple internet-of-things (IoT) devices. To demonstrate its efficacy, we apply this framework to reinforcement learning settings where a third party is interested in acquiring information from agents. In particular, one may be interested in sampling an unknown vehicular traffic flow in a city, using a DLT-type architecture and without perturbing the density, with the idea of realizing a set of virtual tokens as surrogates of real vehicles to explore geographical areas of interest. These tokens, whose authenticated position determines write access to the ledger, are thus used to emulate the probing actions of commanded (real) vehicles on a given planned route by “jumping” from a passing-by vehicle to another to complete the planned trajectory. Consequently, the environment stays unaffected (i.e., the autonomy of participating vehicles is not influenced by the algorithm), regardless of the number of emitted tokens. The design of such a DLT architecture is presented, and numerical results from large-scale simulations are provided to validate the proposed approach. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

44. On the Approximation of Cooperative Heterogeneous Multi-Agent Reinforcement Learning (MARL) using Mean Field Control (MFC).

Author: Mondal, Washim Uddin, Agarwal, Mridul, Aggarwal, Vaneet, and Ukkusuri, Satish V.
Subjects: *REINFORCEMENT learning, *MARL, *MARGINAL distributions, *SAMPLING errors
Abstract: Mean field control (MFC) is an effective way to mitigate the curse of dimensionality of cooperative multi-agent reinforcement learning (MARL) problems. This work considers a collection of Npop heterogeneous agents that can be segregated into K classes such that the k-th class contains Nk homogeneous agents. We aim to prove approximation guarantees of the MARL problem for this heterogeneous system by its corresponding MFC problem. We consider three scenarios where the reward and transition dynamics of all agents are respectively taken to be functions of (1) joint state and action distributions across all classes, (2) individual distributions of each class, and (3) marginal distributions of the entire population. We show that, in these cases, the K-class MARL problem can be approximated by MFC with errors given as e1=O(|X|√+|U|√Npop∑kNk---√), e2=O([|X|---√+|U|---√]∑k1Nk√) and e3=O([|X|---√+|U|---√][ANpop∑k∈[K]Nk---√+BNpop√]), respectively, where A,B are some constants and |X|,|U| are the sizes of state and action spaces of each agent. Finally, we design a Natural Policy Gradient (NPG) based algorithm that, in the three cases stated above, can converge to an optimal MARL policy within O(ej) error with a sample complexity of O(e-3j), j∈-1,2,3}, respectively. [ABSTRACT FROM AUTHOR]
Published: 2022

45. The role of information structures in game-theoretic multi-agent learning.

Author: Li, Tao, Zhao, Yuhan, and Zhu, Quanyan
Subjects: *DATA structures, *REINFORCEMENT learning, *GAMIFICATION
Abstract: Multi-agent learning (MAL) studies how agents learn to behave optimally and adaptively from their experience when interacting with other agents in dynamic environments. The outcome of a MAL process is jointly determined by all agents' decision-making. Hence, each agent needs to think strategically about others' sequential moves, when planning future actions. The strategic interactions among agents makes MAL go beyond the direct extension of single-agent learning to multiple agents. With the strategic thinking, each agent aims to build a subjective model of others decision-making using its observations. Such modeling is directly influenced by agents' perception during the learning process, which is called the information structure of the agent's learning. As it determines the input to MAL processes, information structures play a significant role in the learning mechanisms of the agents. This review creates a taxonomy of MAL and establishes a unified and systematic way to understand MAL from the perspective of information structures. We define three fundamental components of MAL: the information structure (i.e., what the agent can observe), the belief generation (i.e., how the agent forms a belief about others based on the observations), as well as the policy generation (i.e., how the agent generates its policy based on its belief). In addition, this taxonomy enables the classification of a wide range of state-of-the-art algorithms into four categories based on the belief-generation mechanisms of the opponents, including stationary, conjectured, calibrated , and sophisticated opponents. We introduce Value of Information (VoI) as a metric to quantify the impact of different information structures on MAL. Finally, we discuss the strengths and limitations of algorithms from different categories and point to promising avenues of future research. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

46. Multi-Agent Learning-Based Nearly Non-Iterative Stochastic Dynamic Transactive Energy Control of Networked Microgrids.

Author: Pan, Zhenning, Yu, Tao, Li, Jie, Wu, Yufeng, Chen, Junbin, Lu, Jidong, and Zhang, Xiaoshun
Abstract: Coordination of networked microgrids (MGs) offers a promising solution to utilize distributed resources flexibilities and accommodate renewable energy. This paper studies real-time coordination of distribution system operation (DSO) and MGs considering multivariate uncertainty. Current researches suffer from inadaptability to dynamic system uncertainty, extensive iterations, and dependence on prediction. To fill these gaps, A novel multi-agent learning based stochastic dynamic programming (MASDP) is proposed to obtain the optimal policy for entities. Specifically, transactive energy control (TEC) mechanism, which requires only market-based information interactions, is employed to facilitate coordination between MG and DSO. A data-driven offline self-learning is proposed for entities to learn how to manage resources in response to system uncertainty. After sufficient offline learning, online operation of MASDP can be run in both non-iterative and iterative manners, by which near-optimal/optimal real-time solutions of DSO and MGs can be given sequentially and distributedly. Numerical comparisons with state-of-art policies and TEC algorithms verify the optimality, efficiency, adaptability, and scalability of MASDP. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

47. Multi‐agent reinforcement learning via knowledge transfer with differentially private noise.

Author: Zishuo Cheng, Dayong Ye, Tianqing Zhu, Wanlei Zhou, Philip S. Yu, and Congcong Zhu
Subjects: KNOWLEDGE transfer, REINFORCEMENT learning, TRANSFER of training, LEARNING problems, NOISE, INSTRUCTIONAL systems
Abstract: In multi‐agent reinforcement learning, transfer learning is one of the key techniques used to speed up learning performance through the exchange of knowledge among agents. However, there are three challenges associated with applying this technique to real‐world problems. First, most real‐world domains are partially rather than fully observable. Second, it is difficult to pre‐collect knowledge in unknown domains. Third, negative transfer impedes the learning progress. We observe that differentially private mechanisms can overcome these challenges due to their randomization property. Therefore, we propose a novel differential transfer learning method for multi‐agent reinforcement learning problems, characterized by the following three key features. First, our method allows agents to implement real‐time knowledge transfers between each other in partially observable domains. Second, our method eliminates the constraints on the relevance of transferred knowledge, which expands the knowledge set to a large extent. Third, our method improves robustness to negative transfers by applying differentially exponential noise and relevance weights to transferred knowledge. The proposed method is the first to use the randomization property of differential privacy to stimulate the learning performance in multi‐agent reinforcement learning system. We further implement extensive experiments to demonstrate the effectiveness of our proposed method. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

48. Machine Learning Empowered Spectrum Sharing in Intelligent Unmanned Swarm Communication Systems: Challenges, Requirements and Solutions

Author: Ximing Wang, Yuhua Xu, Chaohui Chen, Xiaoqin Yang, Jiaxin Chen, Lang Ruan, Yifan Xu, and Runfeng Chen
Subjects: Unmanned swarm system, spectrum sharing, machine learning, multi-agent learning, game theory, Electrical engineering. Electronics. Nuclear engineering, TK1-9971
Abstract: The unmanned swarm system (USS) has been seen as a promising technology, and will play an extremely important role in both the military and civilian fields such as military strikes, disaster relief and transportation business. As the “nerve center” of USS, the unmanned swarm communication system (USCS) provides the necessary information transmission medium so as to ensure the system stability and mission implementation. However, challenges caused by multiple tasks, distributed collaboration, high dynamics, ultra-dense and jamming threat make it hard for USCS to manage limited spectrum resources. To tackle with such problems, the machine learning (ML) empowered intelligent spectrum management technique is introduced in this paper. First, based on the challenges of the spectrum resource management in USCS, the requirement of spectrum sharing is analyzed from the perspective of spectrum collaboration and spectrum confrontation. We found that suitable multi-agent collaborative decision making is promising to realize effective spectrum sharing in both two perspectives. Therefore, a multi-agent learning framework is proposed which contains mobile-computing-assisted and distributed structures. Based on the framework, we provide case studies. Finally, future research directions are discussed.
Published: 2020
Full Text: View/download PDF

49. Dynamic opponent modelling in two-player games

Author: Mealing, Richard Andrew, Shapiro, Jonathan, and Brown, Gavin
Subjects: 006.3, decision-making, imperfect information, learning in games, dynamic opponents, opponent modelling, sequence prediction, change detection, expectation-maximisation, reinforcement learning, lookahead, best-response, Nash equilibrium, self-play convergence, iterated normal-form games, simplified poker, multi-agent learning, game theory
Abstract: This thesis investigates decision-making in two-player imperfect information games against opponents whose actions can affect our rewards, and whose strategies may be based on memories of interaction, or may be changing, or both. The focus is on modelling these dynamic opponents, and using the models to learn high-reward strategies. The main contributions of this work are: 1. An approach to learn high-reward strategies in small simultaneous-move games against these opponents. This is done by using a model of the opponent learnt from sequence prediction, with (possibly discounted) rewards learnt from reinforcement learning, to lookahead using explicit tree search. Empirical results show that this gains higher average rewards per game than state-of-the-art reinforcement learning agents in three simultaneous-move games. They also show that several sequence prediction methods model these opponents effectively, supporting the idea of using them from areas such as data compression and string matching; 2. An online expectation-maximisation algorithm that infers an agent's hidden information based on its behaviour in imperfect information games; 3. An approach to learn high-reward strategies in medium-size sequential-move poker games against these opponents. This is done by using a model of the opponent learnt from sequence prediction, which needs its hidden information (inferred by the online expectation-maximisation algorithm), to train a state-of-the-art no-regret learning algorithm by simulating games between the algorithm and the model. Empirical results show that this improves the no-regret learning algorithm's rewards when playing against popular and state-of-the-art algorithms in two simplified poker games; 4. Demonstrating that several change detection methods can effectively model changing categorical distributions with experimental results comparing their accuracies to empirical distributions. These results also show that their models can be used to outperform state-of-the-art reinforcement learning agents in two simultaneous-move games. This supports the idea of modelling changing opponent strategies with change detection methods; 5. Experimental results for the self-play convergence to mixed strategy Nash equilibria of the empirical distributions of plays of sequence prediction and change detection methods. The results show that they converge faster, and in more cases for change detection, than fictitious play.
Published: 2015

50. A Theoretical Framework for Large-Scale Human-Robot Interaction with Groups of Learning Agents.

Author: Teh, Nicholas, Shuyue Hu, and Soh, Harold
Subjects: HUMAN-robot interaction, SOCIAL interaction, OPEN-ended questions, ROBOTS
Abstract: Recent advances in robot capabilities have led to a growing consensus that robots will eventually be deployed at scale across numerous application domains. An important open question is how humans and robots will adapt to one another over time. In this paper, we introduce the model-based Theoretical Human-Robot Scenarios (THuS) framework, capable of elucidating the interactions between large groups of humans and learning robots. We formally establish THuS, and consider its application to a human-robot variant of the-player coordination game, demonstrating the power of the theoretical framework as a tool to qualitatively understand and quantitatively compare HRI scenarios that involve different agent types. We also discuss the framework's limitations and potential. Our work provides the HRI community with a versatile tool that permits first-cut insights into large-scale HRI scenarios that are too costly or challenging to carry out in simulations or in the real-world. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

293 results on '"multi-agent learning"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources