43 results
Search Results
2. Research on a Personalized Decision Control Algorithm for Autonomous Vehicles Based on the Reinforcement Learning from Human Feedback Strategy.
- Author
-
Li, Ning and Chen, Pengzhan
- Subjects
LEARNING ,REINFORCEMENT learning ,AUTONOMOUS vehicles ,ALGORITHMS ,DEEP learning - Abstract
To address the shortcomings of previous autonomous decision models, which often overlook the personalized features of users, this paper proposes a personalized decision control algorithm for autonomous vehicles based on RLHF (reinforcement learning from human feedback). The algorithm combines two reinforcement learning approaches, DDPG (Deep Deterministic Policy Gradient) and PPO (proximal policy optimization), and divides the control scheme into three phases including pre-training, human evaluation, and parameter optimization. During the pre-training phase, an agent is trained using the DDPG algorithm. In the human evaluation phase, different trajectories generated by the DDPG-trained agent are scored by individuals with different styles, and the respective reward models are trained based on the trajectories. In the parameter optimization phase, the network parameters are updated using the PPO algorithm and the reward values given by the reward model to achieve personalized autonomous vehicle control. To validate the control algorithm designed in this paper, a simulation scenario was built using CARLA_0.9.13 software. The results demonstrate that the proposed algorithm can provide personalized decision control solutions for different styles of people, satisfying human needs while ensuring safety. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
3. Bio-Inspired Intelligent Swarm Confrontation Algorithm for a Complex Urban Scenario.
- Author
-
Cai, He, Luo, Yaoguo, Gao, Huanli, and Wang, Guangbin
- Subjects
BIOLOGICALLY inspired computing ,MACHINE learning ,WILDLIFE films ,REINFORCEMENT learning ,ALGORITHMS - Abstract
This paper considers the confrontation problem for two tank swarms of equal size and capability in a complex urban scenario. Based on the Unity platform (2022.3.20f1c1), the confrontation scenario is constructed featuring multiple crossing roads. Through the analysis of a substantial amount of biological data and wildlife videos regarding animal behavioral strategies during confrontations for hunting or food competition, two strategies are been utilized to design a novel bio-inspired intelligent swarm confrontation algorithm. The first one is the "fire concentration" strategy, which assigns a target for each tank in a way that the isolated opponent will be preferentially attacked with concentrated firepower. The second one is the "back and forth maneuver" strategy, which makes the tank tactically retreat after firing in order to avoid being hit when the shell is reloading. Two state-of-the-art swarm confrontation algorithms, namely the reinforcement learning algorithm and the assign nearest algorithm, are chosen as the opponents for the bio-inspired swarm confrontation algorithm proposed in this paper. Data of comprehensive confrontation tests show that the bio-inspired swarm confrontation algorithm has significant advantages over its opponents from the aspects of both win rate and efficiency. Moreover, we discuss how vital algorithm parameters would influence the performance indices. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
4. Research on load frequency control of multi‐microgrids in an isolated system based on the multi‐agent soft actor‐critic algorithm.
- Author
-
Xie, Li Long, Li, Yonghui, Fan, Peixiao, Wan, Li, Zhang, Kanjun, and Yang, Jun
- Subjects
DEEP reinforcement learning ,REINFORCEMENT learning ,MULTIAGENT systems ,DISTRIBUTED algorithms ,ALGORITHMS ,FREQUENCY stability ,MICROGRIDS - Abstract
Load variation, distributed power output uncertainty and multi‐microgrids network complexity have brought great difficulties to the frequency stability of the whole microgrid. To address this problem, this paper uses a multi‐agent deep reinforcement learning(DRL) algorithm to design the controllers to control the frequency of the multi‐microgrids. Firstly, a load frequency control (LFC) model for multi‐microgrids was built. Secondly, based on the centralized training and decentralized execution (CTDE) multi‐agent reinforcement learning (RL) framework, the multi‐agent soft actor‐critic (MASAC) algorithm was designed and applied to the multi‐microgrids model. The state space and action space of multi‐agent were established according to the frequency deviation of every sub‐microgrid and the output of each distributed power source. The reward function was then established according to the frequency deviation. The appropriate neural network and training parameters were selected to generate the interconnected microgrid controllers through multiple training of pre‐learning. Finally, the simulation study shows that the MASAC controller proposed in this paper can quickly maintain frequency stability when the system is disturbed. Sensitivity analysis shows that the MASAC controller can effectively cope with the uncertainty of the system parameters. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
5. Differentiated Security Requirements: An Exploration of Microservice Placement Algorithms in Internet of Vehicles.
- Author
-
Zhang, Xing, Liang, Jun, Lu, Yuxi, Zhang, Peiying, and Bi, Yanxian
- Subjects
REINFORCEMENT learning ,TECHNOLOGICAL innovations ,ALGORITHMS ,INTERNET ,COMPUTER software development ,INTERNET of things - Abstract
In recent years, microservices, as an emerging technology in software development, have been favored by developers due to their lightweight and low-coupling features, and have been rapidly applied to the Internet of Things (IoT) and Internet of Vehicles (IoV), etc. Microservices deployed in each unit of the IoV use wireless links to transmit data, which exposes a larger attack surface, and it is precisely because of these features that the secure and efficient placement of microservices in the environment poses a serious challenge. Improving the security of all nodes in an IoV can significantly increase the service provider's operational costs and can create security resource redundancy issues. As the application of reinforcement learning matures, it is enabling faster convergence of algorithms by designing agents, and it performs well in large-scale data environments. Inspired by this, this paper firstly models the placement network and placement behavior abstractly and sets security constraints. The environment information is fully extracted, and an asynchronous reinforcement-learning-based algorithm is designed to improve the effect of microservice placement and reduce the security redundancy based on ensuring the security requirements of microservices. The experimental results show that the algorithm proposed in this paper has good results in terms of the fit of the security index with user requirements and request acceptance rate. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
6. Dynamic multi-strategy integrated differential evolution algorithm based on reinforcement learning for optimization problems.
- Author
-
Yang, Qingyong, Chu, Shu-Chuan, Pan, Jeng-Shyang, Chou, Jyh-Horng, and Watada, Junzo
- Subjects
DIFFERENTIAL evolution ,REINFORCEMENT learning ,ALGORITHMS ,ENGINEERING design ,SET functions ,RANDOM sets - Abstract
The introduction of a multi-population structure in differential evolution (DE) algorithm has been proven to be an effective way to achieve algorithm adaptation and multi-strategy integration. However, in existing studies, the mutation strategy selection of each subpopulation during execution is fixed, resulting in poor self-adaptation of subpopulations. To solve this problem, a dynamic multi-strategy integrated differential evolution algorithm based on reinforcement learning (RLDMDE) is proposed in this paper. By employing reinforcement learning, each subpopulation can adaptively select the mutation strategy according to the current environmental state (population diversity). Based on the population state, this paper proposes an individual dynamic migration strategy to "reward" or "punish" the population to avoid wasting individual computing resources. Furthermore, this paper applies two methods of good point set and random opposition-based learning (ROBL) in the population initialization stage to improve the quality of the initial solutions. Finally, to evaluate the performance of the RLDMDE algorithm, this paper selects two benchmark function sets, CEC2013 and CEC2017, and six engineering design problems for testing. The results demonstrate that the RLDMDE algorithm has good performance and strong competitiveness in solving optimization problems. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
7. Multi-Objective Advantage Actor-Critic Algorithm for Hybrid Disassembly Line Balancing with Multi-Skilled Workers.
- Author
-
Wang, Jiacun, Xi, Guipeng, Guo, Xiwang, Qin, Shujin, and Han, Henry
- Subjects
- *
ALGORITHMS , *REINFORCEMENT learning , *DETERMINISTIC algorithms , *CARBON emissions , *GENETIC algorithms , *REINFORCEMENT (Psychology) - Abstract
The scheduling of disassembly lines is of great importance to achieve optimized productivity. In this paper, we address the Hybrid Disassembly Line Balancing Problem that combines linear disassembly lines and U-shaped disassembly lines, considering multi-skilled workers, and targeting profit and carbon emissions. In contrast to common approaches in reinforcement learning that typically employ weighting strategies to solve multi-objective problems, our approach innovatively incorporates non-dominated ranking directly into the reward function. The exploration of Pareto frontier solutions or better solutions is moderated by comparing performance between solutions and dynamically adjusting rewards based on the occurrence of repeated solutions. The experimental results show that the multi-objective Advantage Actor-Critic algorithm based on Pareto optimization exhibits superior performance in terms of metrics superiority in the comparison of six experimental cases of different scales, with an excellent metrics comparison rate of 70%. In some of the experimental cases in this paper, the solutions produced by the multi-objective Advantage Actor-Critic algorithm show some advantages over other popular algorithms such as the Deep Deterministic Policy Gradient Algorithm, the Soft Actor-Critic Algorithm, and the Non-Dominated Sorting Genetic Algorithm II. This further corroborates the effectiveness of our proposed solution. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
8. Route Planning Algorithms for Unmanned Surface Vehicles (USVs): A Comprehensive Analysis.
- Author
-
Hashali, Shimhanda Daniel, Yang, Shaolong, and Xiang, Xianbo
- Subjects
AUTONOMOUS vehicles ,ALGORITHMS ,REINFORCEMENT learning ,FUZZY algorithms ,ROAD maps ,BEES algorithm - Abstract
This review paper provides a structured analysis of obstacle avoidance and route planning algorithms for unmanned surface vehicles (USVs) spanning both numerical simulations and real-world applications. Our investigation encompasses the development of USV route planning from the year 2000 to date, classifying it into two main categories: global and local route planning. We emphasize the necessity for future research to embrace a dual approach incorporating both simulation-based assessments and real-world field tests to comprehensively evaluate algorithmic performance across diverse scenarios. Such evaluation systems offer valuable insights into the reliability, endurance, and adaptability of these methodologies, ultimately guiding the development of algorithms tailored to specific applications and evolving demands. Furthermore, we identify the challenges to determining optimal collision avoidance methods and recognize the effectiveness of hybrid techniques in various contexts. Remarkably, artificial potential field, reinforcement learning, and fuzzy logic algorithms emerge as standout contenders for real-world applications as consistently evaluated in simulated environments. The innovation of this paper lies in its comprehensive analysis and critical evaluation of USV route planning algorithms validated in real-world scenarios. By examining algorithms across different time periods, the paper provides valuable insights into the evolution, trends, strengths, and weaknesses of USV route planning technologies. Readers will benefit from a deep understanding of the advancements made in USV route planning. This analysis serves as a road map for researchers and practitioners by furnishing insights to advance USV route planning and collision avoidance techniques. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
9. Research on Obstacle Avoidance Planning for UUV Based on A3C Algorithm.
- Author
-
Wang, Hongjian, Gao, Wei, Wang, Zhao, Zhang, Kai, Ren, Jingfei, Deng, Lihui, and He, Shanshan
- Subjects
DEEP learning ,REINFORCEMENT learning ,DEEP reinforcement learning ,MACHINE learning ,ALGORITHMS ,ARTIFICIAL intelligence - Abstract
Deep reinforcement learning is an artificial intelligence technology that combines deep learning and reinforcement learning and has been widely applied in multiple fields. As a type of deep reinforcement learning algorithm, the A3C (Asynchronous Advantage Actor-Critic) algorithm can effectively utilize computer resources and improve training efficiency by synchronously training Actor-Critic in multiple threads. Inspired by the excellent performance of the A3C algorithm, this paper uses the A3C algorithm to solve the UUV (Unmanned Underwater Vehicle) collision avoidance planning problem in unknown environments. This collision avoidance planning algorithm can have the ability to plan in real-time while ensuring a shorter path length, and the output action space can meet the kinematic constraints of UUVs. In response to the problem of UUV collision avoidance planning, this paper designs the state space, action space, and reward function. The simulation results show that the A3C collision avoidance planning algorithm can guide a UUV to avoid obstacles and reach the preset target point. The path planned by this algorithm meets the heading constraints of the UUV, and the planning time is short, which can meet the requirements of real-time planning. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
10. Service Function Chain Deployment Algorithm Based on Deep Reinforcement Learning in Space–Air–Ground Integrated Network.
- Author
-
Feng, Xu, He, Mengyang, Zhuang, Lei, Song, Yanrui, and Peng, Rumeng
- Subjects
DEEP reinforcement learning ,REINFORCEMENT learning ,END-to-end delay ,ALGORITHMS ,ENERGY consumption ,RESOURCE management - Abstract
SAGIN is formed by the fusion of ground networks and aircraft networks. It breaks through the limitation of communication, which cannot cover the whole world, bringing new opportunities for network communication in remote areas. However, many heterogeneous devices in SAGIN pose significant challenges in terms of end-to-end resource management, and the limited regional heterogeneous resources also threaten the QoS for users. In this regard, this paper proposes a hierarchical resource management structure for SAGIN, named SAGIN-MEC, based on a SDN, NFV, and MEC, aiming to facilitate the systematic management of heterogeneous network resources. Furthermore, to minimize the operator deployment costs while ensuring the QoS, this paper formulates a resource scheduling optimization model tailored to SAGIN scenarios to minimize energy consumption. Additionally, we propose a deployment algorithm, named DRL-G, which is based on heuristics and DRL, aiming to allocate heterogeneous network resources within SAGIN effectively. Experimental results showed that SAGIN-MEC can reduce the end-to-end delay by 6–15 ms compared to the terrestrial edge network, and compared to other algorithms, the DRL-G algorithm can improve the service request reception rate by up to 20%. In terms of energy consumption, it reduces the average energy consumption by 4.4% compared to the PG algorithm. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
11. Globally Guided Deep V-Network-Based Motion Planning Algorithm for Fixed-Wing Unmanned Aerial Vehicles.
- Author
-
Du, Hang, You, Ming, and Zhao, Xinyi
- Subjects
- *
REINFORCEMENT learning , *ALGORITHMS , *DRONE aircraft , *VERTICALLY rising aircraft - Abstract
Fixed-wing UAVs have shown great potential in both military and civilian applications. However, achieving safe and collision-free flight in complex obstacle environments is still a challenging problem. This paper proposed a hierarchical two-layer fixed-wing UAV motion planning algorithm based on a global planner and a local reinforcement learning (RL) planner in the presence of static obstacles and other UAVs. Considering the kinematic constraints, a global planner is designed to provide reference guidance for ego-UAV with respect to static obstacles. On this basis, a local RL planner is designed to accomplish kino-dynamic feasible and collision-free motion planning that incorporates dynamic obstacles within the sensing range. Finally, in the simulation training phase, a multi-stage, multi-scenario training strategy is adopted, and the simulation experimental results show that the performance of the proposed algorithm is significantly better than that of the baseline method. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
12. An Adaptive Sampling Algorithm with Dynamic Iterative Probability Adjustment Incorporating Positional Information.
- Author
-
Liu, Yanbing, Chen, Liping, Chen, Yu, and Ding, Jianwan
- Subjects
- *
PARTIAL differential equations , *PIPE flow , *FLUID mechanics , *BURGERS' equation , *REINFORCEMENT learning , *ALGORITHMS - Abstract
Physics-informed neural networks (PINNs) have garnered widespread use for solving a variety of complex partial differential equations (PDEs). Nevertheless, when addressing certain specific problem types, traditional sampling algorithms still reveal deficiencies in efficiency and precision. In response, this paper builds upon the progress of adaptive sampling techniques, addressing the inadequacy of existing algorithms to fully leverage the spatial location information of sample points, and introduces an innovative adaptive sampling method. This approach incorporates the Dual Inverse Distance Weighting (DIDW) algorithm, embedding the spatial characteristics of sampling points within the probability sampling process. Furthermore, it introduces reward factors derived from reinforcement learning principles to dynamically refine the probability sampling formula. This strategy more effectively captures the essential characteristics of PDEs with each iteration. We utilize sparsely connected networks and have adjusted the sampling process, which has proven to effectively reduce the training time. In numerical experiments on fluid mechanics problems, such as the two-dimensional Burgers' equation with sharp solutions, pipe flow, flow around a circular cylinder, lid-driven cavity flow, and Kovasznay flow, our proposed adaptive sampling algorithm markedly enhances accuracy over conventional PINN methods, validating the algorithm's efficacy. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
13. A multi-setpoint cooling control approach for air-cooled data centers using the deep Q-network algorithm.
- Author
-
Chen, Yaohua, Guo, Weipeng, Liu, Jinwen, Shen, Songyu, Lin, Jianpeng, and Cui, Delong
- Subjects
- *
SERVER farms (Computer network management) , *DEEP reinforcement learning , *REINFORCEMENT learning , *INFORMATION technology equipment , *ALGORITHMS , *ATMOSPHERIC temperature - Abstract
Cooling systems provide a safe thermal environment for the reliable operation of IT equipment in data centers (DCs) while generating significant energy consumption. Therefore, to achieve energy savings in cooling system control under dynamic thermal distribution in DCs, this paper proposes a multi-setpoint cooling control approach based on deep reinforcement learning (DRL). Firstly, a thermal model based on the XGBoost algorithm is constructed to precisely evaluate the thermal distribution in the rack room to guide real-time cooling control. Secondly, a multi-set point cooling control approach based on the deep Q-network algorithm (DQN-MSP) is designed to finely regulate the supply air temperature of each air conditioner by capturing the thermal fluctuations to ensure the dynamic balance of cooling supply and demand. Finally, we adopt the extended CloudSimPy simulation tool and the real workload trace of the PlanetLab system to evaluate the effectiveness and performance of the proposed approach. The simulation results show that the proposed control solution effectively reduces the cooling energy consumption by over 2.4% by raising the average air supply temperature of the air conditioner while satisfying the thermal constraints. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
14. Learning Ad Hoc Cooperation Policies from Limited Priors via Meta-Reinforcement Learning.
- Author
-
Fang, Qi, Zeng, Junjie, Xu, Haotian, Hu, Yue, and Yin, Quanjun
- Subjects
COOPERATION ,AD hoc computer networks ,REINFORCEMENT learning ,INSTITUTIONAL repositories ,ALGORITHMS - Abstract
When agents need to collaborate without previous coordination, the multi-agent cooperation problem transforms into an ad hoc teamwork (AHT) problem. Mainstream research on AHT is divided into type-based and type-free methods. The former depends on known teammate types to infer the current teammate type, while the latter does not require them at all. However, in many real-world applications, the complete absence and sufficient knowledge of known types are both impractical. Thus, this research focuses on the challenge of AHT with limited known types. To this end, this paper proposes a method called a Few typE-based Ad hoc Teamwork via meta-reinforcement learning (FEAT), which effectively adapts to teammates using a small set of known types within a single episode. FEAT enables agents to develop a highly adaptive policy through meta-reinforcement learning by employing limited priors about known types. It also utilizes this policy to generate a diverse type repository automatically. During the ad hoc cooperation, the agent can autonomously identify known teammate types followed by directly utilizing the pre-trained optimal cooperative policy or swiftly updating the meta policy to respond to teammates of unknown types. Comprehensive experiments in the pursuit domain validate the effectiveness of the algorithm and its components. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
15. An efficient beaconing of bluetooth low energy by decision making algorithm.
- Author
-
Fujisawa, Minoru, Yasuda, Hiroyuki, Isogai, Ryosuke, Arai, Maki, Yoshida, Yoshifumi, Li, Aohan, Kim, Song-Ju, and Hasegawa, Mikio
- Subjects
ARTIFICIAL intelligence ,DECISION making ,WIRELESS communications ,ALGORITHMS - Abstract
Ongoing research endeavors are exploring the potential of artificial intelligence to enhance the efficiency of wireless communication systems. Nevertheless, complex computational mechanisms, such as those inherent in neural networks, are not optimally suited for applications where the reduction of computational intricacy is of paramount importance. The rise in Bluetooth-enabled devices has led to the widespread adoption of Bluetooth Low Energy (BLE) in various IoT applications, primarily due to its low power consumption. For specific applications, such as lost and found tags which operate on small batteries, it's especially important to further reduce power usage. With the objective of achieving low power consumption by optimally selecting channels and advertisement intervals, this paper introduces a parameter selection method derived from the Multi-Armed Bandit (MAB) algorithm, a technique known for addressing human decision-making challenges. In this study, we evaluate our proposed method using simulations in diverse environments. The outcomes indicate that, without compromising much on reliability, our approach can reduce power consumption by up to 40% based on the wireless surroundings. Additionally, when this method was implemented on an actual BLE device, it demonstrated effectiveness in reducing power consumption by about 35% in real environments. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
16. Personalized Treatment Policies with the Novel Buckley-James Q-Learning Algorithm.
- Author
-
Lee, Jeongjin and Kim, Jong-Min
- Subjects
- *
MACHINE learning , *ALGORITHMS , *SURVIVAL analysis (Biometry) , *TIME management , *PATIENT care , *REINFORCEMENT learning - Abstract
This research paper presents the Buckley-James Q-learning (BJ-Q) algorithm, a cutting-edge method designed to optimize personalized treatment strategies, especially in the presence of right censoring. We critically assess the algorithm's effectiveness in improving patient outcomes and its resilience across various scenarios. Central to our approach is the innovative use of the survival time to impute the reward in Q-learning, employing the Buckley-James method for enhanced accuracy and reliability. Our findings highlight the significant potential of personalized treatment regimens and introduce the BJ-Q learning algorithm as a viable and promising approach. This work marks a substantial advancement in our comprehension of treatment dynamics and offers valuable insights for augmenting patient care in the ever-evolving clinical landscape. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
17. DDPG-Based Convex Programming Algorithm for the Midcourse Guidance Trajectory of Interceptor.
- Author
-
Li, Wan-Li, Li, Jiong, Ye, Ji-Kun, Shao, Lei, and Zhou, Chi-Jun
- Subjects
REINFORCEMENT learning ,DEEP reinforcement learning ,MACHINE learning ,NONCONVEX programming ,CONVEX programming ,ALGORITHMS ,APPROXIMATION error - Abstract
To address the problem of low accuracy and efficiency in trajectory planning algorithms for interceptors facing multiple constraints during the midcourse guidance phase, an improved trajectory convex programming method based on the lateral distance domain is proposed. This algorithm can achieve fast trajectory planning, reduce the approximation error of the planned trajectory, and improve the accuracy of trajectory guidance. First, the concept of lateral distance domain is proposed, and the motion model of the midcourse guidance segment in the interceptor is converted from the time domain to the lateral distance domain. Second, the motion model and multiple constraints are convexly and discretely transformed, and the discrete trajectory convex model is established in the lateral distance domain. Third, the deep reinforcement learning algorithm is used to learn and train the initial solution of trajectory convex programming, and a high-quality initial solution trajectory is obtained. Finally, a dynamic adjustment method based on the distribution of approximate solution errors is designed to achieve efficient dynamic adjustment of grid points in iterative solving. The simulation experiments show that the improved trajectory convex programming algorithm proposed in this paper not only improves the accuracy and efficiency of the algorithm but also has good optimization performance. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
18. Switching threshold event‐triggered critic algorithm for optimal orbit tracking and formation motion.
- Author
-
Yu, Rui, Chen, Yang‐Yang, and Zhang, Ya
- Subjects
- *
ORBITS (Astronomy) , *ALGORITHMS , *CRITICS , *MULTIAGENT systems , *REINFORCEMENT learning - Abstract
This paper deals with the orbit tracking and formation motion problems with optimal energies and reduced computational cost. First, the orbit tracking and formation motion are decomposed into movements in the normal and tangent directions of level orbits, respectively, and simultaneously, the optimal value functions in both directions are defined. Then, to reduce the computational cost, a switching threshold event‐triggered (STET) mechanism is designed. Based on the STET mechanism, the optimal value functions are constructed to evaluate the optimal energies of orbit tracking and formation motion. Critic neural networks are then designed to approximate the optimal value functions, which yield the optimal policies along the normal and tangent directions of desired orbits, that is, a so‐called switching threshold event‐triggered critic algorithm (STET‐C). Theoretical analysis of system convergence is given in detail. Finally, two comparison simulations are given. The former intends to verify the optimal energy of STET‐C compared to the feedback controllers. The latter shows that STET‐C significantly reduces the computational cost in contrast with the non‐triggered actor‐critic algorithm, non‐triggered critic algorithm, and the relative threshold event‐triggered critic algorithm. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
19. MOEA with adaptive operator based on reinforcement learning for weapon target assignment.
- Author
-
Zou, Shiqi, Shi, Xiaoping, and Song, Shenmin
- Subjects
- *
REINFORCEMENT learning , *ALGORITHMS , *MACHINE learning , *ARTIFICIAL intelligence , *DIGITAL technology - Abstract
Weapon target assignment (WTA) is a typical problem in the command and control of modern warfare. Despite the significance of the problem, traditional algorithms still have shortcomings in terms of efficiency, solution quality, and generalization. This paper presents a novel multi-objective evolutionary optimization algorithm (MOEA) that integrates a deep Q-network (DQN)-based adaptive mutation operator and a greedy-based crossover operator, designed to enhance the solution quality for the multi-objective WTA (MO-WTA). Our approach (NSGA-DRL) evolves NSGA-II by embedding these operators to strike a balance between exploration and exploitation. The DQN-based adaptive mutation operator is developed for predicting high-quality solutions, thereby improving the exploration process and maintaining diversity within the population. In parallel, the greedy-based crossover operator employs domain knowledge to minimize ineffective searches, focusing on exploitation and expediting convergence. Ablation studies revealed that our proposed operators significantly boost the algorithm performance. In particular, the DQN mutation operator shows its predictive effectiveness in identifying candidate solutions. The proposed NSGA-DRL outperforms state-and-art MOEAs in solving MO-WTA problems by generating high-quality solutions. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
20. Autonomous Obstacle Avoidance Algorithm for Unmanned Aerial Vehicles Based on Deep Reinforcement Learning.
- Author
-
Yuan Gao, Ling Ren, Tianwei Shi, Teng Xu, and Jianbang Ding
- Subjects
- *
REINFORCEMENT learning , *DEEP reinforcement learning , *DIGITAL-to-analog converters , *DRONE aircraft , *CONVOLUTIONAL neural networks , *ALGORITHMS , *AUTONOMOUS vehicles - Abstract
To overcome the challenges of obstacle avoidance for Unmanned Aerial Vehicles (UAVs) in autonomous flights, this paper proposes the Dual Experience Attention Convolution Soft Actor-Critic (DAC-SAC) algorithm. This algorithm integrates a dual experience buffer pool, a self-attention mechanism, and the Soft-Actor-Critic algorithm with a convolutional network. The dual experience buffer pools are used to solve the problem of ineffective UAV training due to the scarcity of successful training data. To overcome the drawbacks of the original Soft Actor-Critic (SAC) algorithm in handling image data, a Convolutional Neural Network (CNN) is applied to reconstruct the actor and critic network, allowing for better image feature extraction and classification. Furthermore, a self-attention mechanism is employed by adding a convolutional self-attention layer to the network. This modification enables dynamic adjustments for the attention weights based on varying input image features, effectively addressing focus-related challenges. Two simulation experiments are performed and the DAC-SAC algorithm achieves a 99.5% success rate in a known environment and an 84.8% success rate when dealing with an unknown environment. These results confirm that the proposed algorithm enables autonomous obstacle avoidance for UAVs even when considering depth images as input. [ABSTRACT FROM AUTHOR]
- Published
- 2024
21. Large-scale UAV swarm confrontation based on hierarchical attention actor-critic algorithm.
- Author
-
Nian, Xiaohong, Li, Mengmeng, Wang, Haibo, Gong, Yalei, and Xiong, Hongyun
- Subjects
REINFORCEMENT learning ,ALGORITHMS ,DRONE aircraft - Abstract
In large-scale unmanned aerial vehicle (UAV) swarm confrontation scenarios, the design of decision-making and coordination strategies becomes extremely difficult. Multi-Agent Reinforcement Learning (MARL), as a novel decision-making approach to address this issue, faces challenges such as poor scalability and the curse of dimensionality. To overcome these challenges, the paper proposes a Hierarchical Attention Actor-Critic (HAAC) algorithm. The HAAC algorithm includes a centralized critic network based on a Hierarchical Two-stage Attention Network (H2ANet), along with a hierarchical actor policy network that combines rules and reinforcement learning approaches. H2ANet is specifically designed to model the relationships between UAVs and extract crucial information from neighboring UAVs, enabling the generation of advanced cooperative and competitive strategies. The HAAC algorithm effectively reduces the dimensionality of both action and state spaces. Experimental results conducted demonstrate that the HAAC algorithm outperforms existing methods and is able to extend its learned policies to large-scale scenarios. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
22. Fractional-Order Control Method Based on Twin-Delayed Deep Deterministic Policy Gradient Algorithm.
- Author
-
Jiao, Guangxin, An, Zhengcai, Shao, Shuyi, and Sun, Dong
- Subjects
- *
RADIAL basis functions , *SLIDING mode control , *REINFORCEMENT learning , *ALGORITHMS , *MACHINE learning , *CLOSED loop systems - Abstract
In this paper, a fractional-order control method based on the twin-delayed deep deterministic policy gradient (TD3) algorithm in reinforcement learning is proposed. A fractional-order disturbance observer is designed to estimate the disturbances, and the radial basis function network is selected to approximate system uncertainties in the system. Then, a fractional-order sliding-mode controller is constructed to control the system, and the parameters of the controller are tuned using the TD3 algorithm, which can optimize the control effect. The results show that the fractional-order control method based on the TD3 algorithm can not only improve the closed-loop system performance under different operating conditions but also enhance the signal tracking capability. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
23. Performance Evaluation of Multi-Agent Reinforcement Learning Algorithms.
- Author
-
Abdulghani, Abdulghani M., Abdulghani, Mokhles M., Walters, Wilbur L., and Abed, Khalid H.
- Subjects
MACHINE learning ,MARL ,REINFORCEMENT learning ,DISTRIBUTED algorithms ,MULTIAGENT systems ,AUGMENTED reality ,ALGORITHMS - Abstract
Multi-Agent Reinforcement Learning (MARL) has proven to be successful in cooperative assignments. MARL is used to investigate how autonomous agents with the same interests can connect and act in one team. MARL cooperation scenarios are explored in recreational cooperative augmented reality environments, as well as realworld scenarios in robotics. In this paper, we explore the realm of MARL and its potential applications in cooperative assignments. Our focus is on developing a multi-agent system that can collaborate to attack or defend against enemies and achieve victory withminimal damage. To accomplish this, we utilize the StarCraftMulti-Agent Challenge (SMAC) environment and train four MARL algorithms: Q-learning with Mixtures of Experts (QMIX), Value-DecompositionNetwork (VDN), Multi-agent Proximal PolicyOptimizer (MAPPO), andMulti-Agent Actor Attention Critic (MAA2C). These algorithms allow multiple agents to cooperate in a specific scenario to achieve the targeted mission. Our results show that the QMIX algorithm outperforms the other three algorithms in the attacking scenario, while the VDN algorithm achieves the best results in the defending scenario. Specifically, the VDNalgorithmreaches the highest value of battlewonmean and the lowest value of dead alliesmean. Our research demonstrates the potential forMARL algorithms to be used in real-world applications, such as controllingmultiple robots to provide helpful services or coordinating teams of agents to accomplish tasks that would be impossible for a human to do. The SMACenvironment provides a unique opportunity to test and evaluateMARL algorithms in a challenging and dynamic environment, and our results show that these algorithms can be used to achieve victory with minimal damage. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
24. Edge Computing Task Offloading of Internet of Vehicles Based on Improved MADDPG Algorithm.
- Author
-
Ziyang Jin, Yijun Wang, and Jingying Lv
- Subjects
EDGE computing ,ALGORITHMS ,REINFORCEMENT learning ,INTERNET ,MACHINE learning - Abstract
Edge computing is frequently employed in the Internet of Vehicles, although the computation and communication capabilities of roadside units with edge servers are limited. As a result, to perform distributed machine learning on resource-limited MEC systems, resources have to be allocated sensibly. This paper presents an Improved MADDPG algorithm to overcome the current IoV concerns of high delay and limited offloading utility. Firstly, we employ the MADDPG algorithm for task offloading. Secondly, the edge server aggregates the updated model and modifies the aggregation model parameters to achieve optimal policy learning. Finally, the new approach is contrasted with current reinforcement learning techniques. The simulation results show that compared with MADDPG and MAA2C algorithms, our algorithm improves offloading utility by 2% and 9%, and reduces delay by 29.6%. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
25. A Strong Maneuvering Target-Tracking Filtering Based on Intelligent Algorithm.
- Author
-
Li, Jing, Liang, Xinru, Yuan, Shengzhi, Li, Haiyan, and Gao, Changsheng
- Subjects
- *
MANEUVERING boards , *REINFORCEMENT learning , *ALGORITHMS - Abstract
In this paper, a variable-structure multimodel (VSMM) filtering algorithm based on the long short-term memory (LSTM) regression-deep Q network (L-DQN) is proposed to accurately track strong maneuvering targets. The algorithm can map the selection of the model set to the selection of the action label and realize the purpose of a deep reinforcement-learning agent to replace the model switching in the traditional VSMM algorithm by reasonably designing a reward function, state space, and network structure. At the same time, the algorithm introduces a LSTM algorithm, which can compensate the error of tracking results based on model history information. The simulation results show that compared with the traditional VSMM algorithm, the proposed algorithm can quickly capture the maneuvering of the target, the response time is short, the calculation accuracy is significantly improved, and the range of adaptation is wider. Precise tracking of maneuvering targets was achieved. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
26. Autonomous Shape Decision Making of Morphing Aircraft with Improved Reinforcement Learning.
- Author
-
Jiang, Weilai, Zheng, Chenghong, Hou, Delong, Wu, Kangsheng, and Wang, Yaonan
- Subjects
DECISION making ,ALGORITHMS - Abstract
The autonomous shape decision-making problem of a morphing aircraft (MA) with a variable wingspan and sweep angle is studied in this paper. Considering the continuity of state space and action space, a more practical autonomous decision-making algorithm framework of MA is designed based on the deep deterministic policy gradient (DDPG) algorithm. Furthermore, the DDPG with a task classifier (DDPGwTC) algorithm is proposed in combination with the long short-term memory (LSTM) network to improve the convergence speed of the algorithm. The simulation results show that the shape decision-making algorithm based on the DDPGwTC enables MA to adopt the optimal morphing strategy in different task environments with higher autonomy and environmental adaptability, which verifies the effectiveness of the proposed algorithm. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
27. Bandit algorithms: A comprehensive review and their dynamic selection from a portfolio for multicriteria top-k recommendation.
- Author
-
Letard, Alexandre, Gutowski, Nicolas, Camp, Olivier, and Amghar, Tassadit
- Subjects
- *
RECOMMENDER systems , *FUZZY sets , *ALGORITHMS , *REINFORCEMENT learning - Abstract
This paper discusses the use of portfolio approaches based on bandit algorithms to optimize multicriteria decision-making in recommender systems (accuracy and diversity). While previous research has primarily focused on single-item recommendations, this study extends the research to consider the recommendation of several items per iteration. Two methods, Multiple-play Gorthaur and Budgeted-Gorthaur, are proposed to solve the algorithm selection problem and their performances on real-world datasets are compared. Both methods provide a generalization of the Gorthaur method, which enables it to operate with any Multi-Armed Bandit (MAB) and Contextual Multi-Armed Bandit (CMAB) algorithm as meta-algorithm in a multi-item recommendation scenario. For Multiple-play Gorthaur, an empirical evaluation shows that the use of Thompson Sampling for algorithm selection (Gorthaur-TS) yields better results than the original EXP3 method (Gorthaur-EXP3) and the exclusive use of the optimal algorithm in the portfolio in contextual recommendation problems. Additionally, the paper includes a theoretical regret analysis based on the TS sketch proof applied for this variant of the method. Concerning Budgeted-Gorthaur, experiments show that it allows more flexibility to achieve a suitable trade-off between criteria and a broader coverage of the Pareto set of solutions, overcoming a natural limit of "a-priori" methods. Finally, this paper provides a detailed review, including pseudocodes and theoretical bounds, for all the fundamental MAB and CMAB algorithms used in this study. • Bandit literature lacks formal algorithm review, hindering clarity and comparability. • There is no silver bullet: no algorithm can be the best performer in every instance. • Recommender systems need to balance accuracy, diversity, multi-item recommendations. • Optimal algorithm balances criteria, matching decision maker's preferred trade-off. • Dynamic selection ensures safe performance when optimal algorithm is unknown. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
28. Adaptive reinforcement learning-based control using proximal policy optimization and slime mould algorithm with experimental tower crane system validation.
- Author
-
Zamfirache, Iuliu Alexandru, Precup, Radu-Emil, and Petriu, Emil M.
- Subjects
TOWER cranes ,METAHEURISTIC algorithms ,REINFORCEMENT learning ,DISCRIMINATION against overweight persons ,ALGORITHMS ,FIXED interest rates - Abstract
This paper presents a novel optimal reference tracking control approach resulted from the combination of a popular policy gradient Reinforcement Learning (RL) algorithm, namely Proximal Policy Optimization (PPO), and a metaheuristic Slime Mould Algorithm (SMA). One of the most important parameters in the PPO-based RL process is the learning rate, which has a big impact on how the parameters of the actor neural network (NN) are iteratively updated. In every episode of the RL process, the weights and the biases of the actor NN are multiplied with the learning rate, determining how much the learning agent will step into a certain direction computed based on previous experiences. The classical PPO algorithm usually relies on fixed values for the learning rates which rarely change, or not at all, during the learning process. However, its main drawback is that the learning agent cannot take advantage of positive momentum in the learning process by accelerating towards good learning experiences or slow down and quickly change the direction in the case of consecutive negative learning experiences. The main objective of the combination proposed in this paper is to create an adaptive SMA-based PPO approach applied to control systems, which instead of using fixed learning rate values, it uses the SMA to compute optimal values of the learning rates in each time step of the learning process based on the progress of the learning agent. This paper investigates if the adaptive SMA-based PPO control approach can be considered as an alternative to the classical PPO version, which employs fixed values of the learning rate. A comparison is carried out using control system performance indices gathered while performing an optimal reference tracking control task on tower crane system laboratory equipment. [Display omitted] • A combination of Proximal Policy Optimization and metaheuristic SMA is given. • The adaptive approach mitigates the drawbacks of constant learning rates. • SMA computes optimal values of learning rates in each learning process step. • A comparison with classical Proximal Policy Optimization is carried out. • The validation is done on real-time nonlinear tower crane position control. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
29. An information entropy-driven evolutionary algorithm based on reinforcement learning for many-objective optimization.
- Author
-
Liang, Peng, Chen, Yangtao, Sun, Yafeng, Huang, Ying, and Li, Wei
- Subjects
- *
REINFORCEMENT learning , *EVOLUTIONARY algorithms , *OPTIMIZATION algorithms , *ALGORITHMS , *CLASSROOM environment - Abstract
Many-objective optimization problems (MaOPs) are challenging tasks involving optimizing many conflicting objectives simultaneously. Decomposition-based many-objective evolutionary algorithms have effectively maintained a balance between convergence and diversity in recent years. However, these algorithms face challenges in accurately approximating the complex geometric structure of irregular Pareto fronts (PFs). In this paper, an information entropy-driven evolutionary algorithm based on reinforcement learning (RL-RVEA) for many-objective optimization with irregular Pareto fronts is proposed. The proposed algorithm leverages reinforcement learning to guide the evolution process by interacting with the environment to learn the shape and features of PF, which adaptively adjusts the distribution of reference vectors to cover the PFs structure effectively. Moreover, an information entropy-driven adaptive scalarization approach is designed in this paper to reflect the diversity of nondominated solutions, which facilitates the algorithm to balance multiple competing objectives adaptively and select solutions efficiently while maintaining individual diversity. To verify the effectiveness of the proposed algorithm, the RL-RVEA compared with seven state-of-the-art algorithms on the DTLZ, MaF, and WFG test suites and four real-world MaOPs. The results of the experiments demonstrate that the suggested algorithm provides a novel and practical method for addressing MaOPs with irregular PFs. • A novel RL-RVEA addresses many-objective optimization with irregular pareto fronts. • A reinforcement learning-based adaptive reference vector to guide the direction of convergence. • A scalarization approach preserves the diversity of solutions for next generation. • The RL-RVEA outperforms seven advanced many-objective optimization algorithms. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
30. Intelligent adaptive lighting algorithm: Integrating reinforcement learning and fuzzy logic for personalized interior lighting.
- Author
-
Vashishtha, Kritika, Saad, Anas, Faieghi, Reza, and Xi, Fengfeng
- Subjects
- *
INTERIOR lighting , *FUZZY logic , *MACHINE learning , *AIRCRAFT cabins , *ALGORITHMS , *INTELLIGENT tutoring systems , *REINFORCEMENT learning , *ONLINE algorithms - Abstract
The lighting requirements are subjective and one light setting cannot work for all. However, there is little work on developing smart lighting algorithms that can adapt to user preferences. To address this gap, this paper uses fuzzy logic and reinforcement learning to develop an adaptive lighting algorithm. In particular, we develop a baseline fuzzy inference system (FIS) using the domain knowledge, generating light recommendation based on a set of intuitive rules. These rules, derived from existing literature, are based on environmental conditions i.e. daily glare index, and user information including age, activity, and chronotype. Through a feedback mechanism, the user interacts with the algorithm, correcting the algorithm output to their preferences. We interpret these corrections as rewards to a Q-learning algorithm, which tunes the FIS parameters online to match the user preferences. The Q-learning is a model-free learning algorithm that learns to act optimally by interacting with the user and the rewards it receives. This allows the proposed algorithm to work in a model-free manner, effectively handling the uncertainties that might arise from the individualistic preferences of users. To the authors' best knowledge, this algorithm is pioneering work in designing intelligent algorithms for personalized lighting control, featuring several elements of novelty, including the number of environmental and user-related inputs, the continuous control of light intensity as opposed to common on/off control, and the ability to learn user preferences. The algorithm is implemented in a real aircraft cabin and is evaluated in an extensive user study. The implementation results demonstrate that the developed algorithm possesses the capability to learn user preferences while successfully adapting to a wide range of environmental conditions and user characteristics. This underscores its viability as a potent solution for intelligent light management, featuring advanced learning capabilities. • Intelligent lighting algorithm with the ability to learn from and adapt to user preferences. • A fuzzy inference system tunes lighting based on environment and user traits. • The developed algorithm is tested and verified via an in-depth user study. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
31. Comparative analysis of grid-interactive building control algorithms: From model-based to learning-based approaches.
- Author
-
Biagioni, David, Zhang, Xiangyu, Adcock, Christiane, Sinner, Michael, Graf, Peter, and King, Jennifer
- Subjects
- *
ARTIFICIAL intelligence , *SOFTWARE frameworks , *COMPARATIVE studies , *RESEARCH personnel , *ALGORITHMS , *KNOWLEDGE gap theory , *MACHINE learning - Abstract
Grid-interactive building control poses a critical challenge in the context of grid modernization and decarbonization. Recently, various artificial intelligence-based optimal control approaches have been proposed, providing innovative solutions to optimal building control problems. However, researchers and practitioners are now confronted with a dilemma when selecting appropriate control strategies, ranging from model-based (knowledge-incorporating) to learning-based (data-driven) approaches, and hybrid methods combining them. Although each algorithm in existing literature claims superiority over specific baselines, their performance has never been systematically compared and analyzed, owing to the absence of a unified platform for comprehensive evaluation. To fill this knowledge gap, we identify and implement all state-of-the-art approaches within a modular training and evaluation framework, assessing their efficacy in a grid-interactive building control problem. In this paper, we also introduce a streamlined hybrid method that complements existing hybrid approaches. Our comparative study reveals and quantifies the advantages of hybrid methods: on average, they achieve near-optimal control while requiring less than 14% of the online computation of traditional model-based methods. To achieve this performance, they need as few as 2% of the training samples of purely learning-based methods. Finally, we provide insights into the merits, limitations, and implementation of each method to help researchers better understand the state of the art and future directions. The software framework implemented in this study is open-sourced to facilitate future research. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
32. Multitask Augmented Random Search in deep reinforcement learning.
- Author
-
Thanh, Le Tien, Thang, Ta Bao, Cuong, Le Van, and Binh, Huynh Thi Thanh
- Subjects
DEEP reinforcement learning ,REINFORCEMENT learning ,OPTIMIZATION algorithms ,COMPUTER multitasking ,EVOLUTIONARY algorithms ,ALGORITHMS - Abstract
Reinforcement Learning (RL) has gained significant popularity in recent years for its ability to solve complex control problems. However, most existing RL algorithms are designed to train policies for each environment in isolation, limiting their applicability to real-world scenarios with many related environments. Recently, many multitask optimization algorithms have been proposed and successfully applied to a wide range of optimization problems. However, existing studies mainly focus on solving multiple continuous function benchmarks and overlook the potential of tailoring towards RL. In this paper, we propose a simple multitask optimization algorithm called Multitask Augmented Random Search (MARS) that trains multiple RL agents together and exploits the performance surplus from highly correlated tasks. MARS is a modification of the simple random search Augmented Random Search (ARS) algorithm, which has been shown to outperform complicated methods in solving continuous control MuJoCo environments such as Soft Actor-Critic (SAC), Proximal Policy Optimization (PPO), and Trusted Region Policy Optimization (TRPO). The experimental results also demonstrate that our proposed algorithm is more consistent in solving different instances of MuJoCo benchmark than ARS, Multifactorial Evolution Algorithm (MFEA), and Adaptive MFEA RL (AMFEARL) within the same number of training episodes. • Propose a multitask extension of ARS for training related continuous control problems. • Derived an online data-driven measurement to preempt the transfer of solutions incomplementary. • Conduct experiments on synthetic benchmarks and in the multitask Mujoco environments. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
33. Optimal demand response based dynamic pricing strategy via Multi-Agent Federated Twin Delayed Deep Deterministic policy gradient algorithm.
- Author
-
Ma, Haining, Zhang, Huifeng, Tian, Ding, Yue, Dong, and Hancke, Gerhard P.
- Subjects
- *
TIME-based pricing , *FEDERATED learning , *REINFORCEMENT learning , *LOAD management (Electric power) , *DISTRIBUTED algorithms , *ALGORITHMS - Abstract
The intermittent integration of renewable energy sources and the enhancement of energy-saving awareness on the demand side have posed significant challenges to energy management in smart grids. To address the supply–demand imbalance issues caused by these challenges, this paper proposes a novel multiple time-scale dispatch model. The model utilizes demand response strategy for demand side management (DSM) and employs dynamic pricing strategy for supply-side management. Furthermore, an advanced approach based on federated learning and Twin Delayed Deep Deterministic(TD3) policy gradient algorithm is proposed to construct a federated learning mechanism among multi agents. This approach performs distributed optimization on the multi-agent system to derive the optimal network weights for multiple stakeholders, resulting in the generation of an optimal dynamic pricing strategy that ensures the economic efficiency and secure operation of the smart grid. From the methodological perspective, this approach enhances privacy protection and decision-making autonomy for each stakeholder, while improving the optimization efficiency of the reinforcement learning (RL) algorithm. Based on simulation results, it is evident that the proposed method facilitates economically efficient transactions between supply and demand, and the improved distributed multi-agent federated reinforcement learning (MAFRL) approach holds promise as a feasible and optimal optimization method for economic dispatch in smart grid. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
34. A vehicle value based ride-hailing order matching and dispatching algorithm.
- Author
-
Shi, Bing, Xia, Yiming, Xu, Shuai, and Luo, Yikai
- Subjects
- *
PUBLIC welfare , *SOCIAL services , *REINFORCEMENT learning , *PRICES , *ALGORITHMS , *RIDESHARING services - Abstract
Online ride-hailing has become one of the most important transportation ways. In the ride-hailing system, how to efficiently match orders with vehicles and dispatch idle vehicles are key issues. The ride-hailing platform needs to match orders with vehicles and dispatch idle vehicles efficiently to maximize social welfare. However, the matching and dispatching decisions at the current round may affect the supply and demand of ride-hailing in the future rounds since they will affect the future vehicle distributions in different geographical zones. In fact, vehicles in different zones at different times may have different values for the matching and dispatching results. In this paper, we use the vehicle value function to characterize the spatio-temporal value of vehicles in each zone and then use it to design the order matching and idle vehicle dispatching algorithm to improve the long-term social welfare. In addition, in the order matching, passengers may untruthfully report the maximum price they are willing to pay to maximize their own profits, which can affect the order matching and thus may result in the losses of the long-term social welfare. Therefore, we design a VCG based pricing algorithm to prevent the strategic behavior of passengers. We further run experiments to evaluate the proposed algorithm. The experimental results show that our algorithm can outperform the state-of-the-art algorithm in terms of social welfare by 11.7% and service ratio by 11.1%. This work can provide some useful insights for the online ride-hailing platform to design practical order matching and pricing strategies. [Display omitted] • We design a vehicle value function to optimize order matching and improve the long-term social welfare by using a reinforcement learning method. • We consider the dispatching of idle vehicles to a zone as a virtual order and process idle vehicle dispatching issue as (virtual) order matching as well. • We design a VCG based pricing method to prevent the strategic behavior of passengers and ensure positive profits for the platform. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
35. Walking control of semi-passive robot via a modified Q-learning algorithm.
- Author
-
Sun, Zhongkui, Zhou, Yining, Xu, Wei, and Wang, Yuexin
- Subjects
- *
ROBOT control systems , *ALGORITHMS , *REINFORCEMENT learning , *Q-switched lasers , *DYNAMIC models - Abstract
The analysis and control of stability of passive biped robot has been the subject of an in-depth study because of its unique characteristics. This work gives a new perspective aiming at a force applied on foot of collision phase of walking process of passive robot, which is a completely new way of applying impulse force that is different from the previous. Therefore, the new equation has been derived due to this force which is not parallel to the support leg to provide further analysis for the study. Besides, this paper combines the characteristics of passive walking with the process of reinforcement learning, and an improved algorithm thus was designed to calculate the value of the control force. The simulation results show that control method designed in this article can not only make the unsteady passive robot achieve stable walking, but also have a quick control speed. Furthermore, the selection range of initial values is expanded by using this method, which provides a convenient reference for the later research work. • A new dynamic model of passive robot is present based on pulse control. • A modified Q-learning algorithm is applied in control method. • Stable walking comes true quickly under the introduced method. • The amount of work required to calculate the initial value for a stable walk is reduced. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
36. A self-learning differential evolution algorithm with population range indicator.
- Author
-
Zhao, Fuqing, Zhou, Hao, Xu, Tianpeng, and Jonrinaldi
- Subjects
- *
DIFFERENTIAL evolution , *DEEP reinforcement learning , *REINFORCEMENT learning , *AUTODIDACTICISM , *EVOLUTIONARY algorithms , *ALGORITHMS - Abstract
The differential evolution (DE) algorithm is widely regarded as one of the most influential evolutionary algorithms for addressing complex optimization problems. However, the fixed mutation strategy limits the adaptive ability of DE, and the lack of utilization of historical information limits the optimization ability of DE. In this paper, an indicator-based self-learning differential evolution algorithm (ISDE) is proposed. A jump out mechanism based on deep reinforcement learning is adopted to control the mutation intensity of the population. The neural network in the jump out mechanism is designed as a decision maker. The mutation intensity of the population is controlled by the neural network, and the neural network are trained by a double deep Q network algorithm based on the continuous data generated during the evolution process. A population range indicator (PRI) is utilized to describe individual differences in the population. A diversity maintenance mechanism is designed to maintain individual differences according to the value of PRI. The experimental results reveal that the comprehensive performance of ISDE is superior to comparison algorithms on CEC 2017 real-parameter numerical optimization. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
37. How predictors affect the RL-based search strategy in Neural Architecture Search?
- Author
-
Wu, Jia, Deng, Tianjin, and Hu, Qi
- Subjects
- *
REINFORCEMENT learning , *AUTOMATIC speech recognition , *ALGORITHMS - Abstract
Predictor-based Neural Architecture Search is an important topic since it can efficiently reduce the computational cost of evaluating candidate architectures. Most existing predictor-based NAS algorithms aim to design different predictors to improve prediction performance. Unfortunately, even a promising performance predictor may suffer from the accuracy decline due to long-term and continuous usage, thus leading to the degraded performance of the search strategy. That naturally gives rise to the following problems: how do predictors affect search strategies and how to efficiently use the predictor? In this paper, we take Reinforcement Learning (RL) based search strategy to study theoretically and empirically the impact of predictors on search strategies. We first formulate an RL-Predictor-based NAS algorithm as model-based RL and analyze it with a guarantee of monotonic improvement. Then, based on this analysis, we propose a simple procedure of predictor usage, named m i x e d b a t c h , which contains ground-truth data and prediction data in a batch. The proposed procedure can efficiently reduce the impact of predictor errors on the RL-based search strategy with maintaining performance growth. Our algorithm, RL-Predictor-based Neural Architecture Search with Mixed batch (RPNASM), outperforms traditional NAS algorithms and prior state-of-the-art predictor-based NAS algorithms on three NAS-Bench-201 tasks and one NAS-Bench-ASR task. Our code is available at https://github.com/tjdeng/RPNASM. • Theoretically analyze predictor's impact on RL search strategy for the first time. • Perform comprehensive experiments to investigate RL-Predictor based NAS algorithms. • Propose RL-Predictor-based NAS framework to enhance search performance. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
38. A DQL-NSGA-III algorithm for solving the flexible job shop dynamic scheduling problem.
- Author
-
Tang, Hongtao, Xiao, Yu, Zhang, Wei, Lei, Deming, Wang, Jing, and Xu, Tao
- Subjects
- *
PRODUCTION scheduling , *PRODUCTION management (Manufacturing) , *REINFORCEMENT learning , *PRODUCTION planning , *ALGORITHMS - Abstract
In recent years, the flexible job shop dynamic scheduling problem (FJDSP) has received considerable attention; however, FJDSP with transportation resource constraint is seldom investigated. In this study, FJDSP with transportation resource constraint is considered and an improved non-dominated sorting genetic algorithm-III (NSGA-III) algorithm (DQNSGA) integrated with reinforcement learning (RL) is proposed. In DQNSGA, an initialization method based on heuristic rules and an insertional greedy decoding approach are designed, and a double-Q Learning with an improved ε -greedy strategy is used to adaptively adjust the key parameters of NSGA-III. An improved elite selection strategy is also applied. Through extensive experiments and practical case studies, this algorithm has been compared with three other well-known algorithms. The results demonstrate that DQNSGA exhibits significant effectiveness and superiority in all tests. The research presented in this paper enables effective adjustments of production plans in response to dynamic events, which is of critical importance for production management in the manufacturing industry. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
39. Task offloading and trajectory scheduling for UAV-enabled MEC networks: An MADRL algorithm with prioritized experience replay.
- Author
-
Shi, Huaguang, Tian, Yuxiang, Li, Hengji, Huang, Jian, Shi, Lei, and Zhou, Yi
- Subjects
REINFORCEMENT learning ,DRONE aircraft ,DEEP reinforcement learning ,MOBILE computing ,ALGORITHMS ,TECHNOLOGICAL innovations ,EDGE computing - Abstract
As a new network architecture, the air-ground cooperative network is a support for future 6G network to achieve ubiquitous connectivities. To effectively relieve the computational pressure of massive data in 6G wireless networks, Unmanned Aerial Vehicles (UAVs) equipped with Mobile Edge Computing (MEC) servers have become an emerging technology that provides computing resources for Mobile Devices (MDs). Due to limited on-board energy and computational capabilities, this paper investigates a multi-UAV collaborative assisted MEC architecture. The optimization problem of minimizing the total computational cost is constructed by jointly optimizing the UAVs trajectories and MDs offloading strategies scheduling. The coupling between the optimization variables and the non-convexity of the problem can make it difficult to solve directly. To address the above concerns, the non-convex optimization problem is converted into a Markov decision process. The UAVs-assisted Offloading Strategy based on Reinforcement Learning (UOS-RL) algorithm is proposed to address the convergence difficulties caused by the high-dimensional continuous action space. Furthermore, the experience data generated by agents interacting with the environment is highly differentiated due to the highly dynamic variation of the environment. Hence, a Priority Experience Replay (PER) mechanism is proposed to improve the training efficiency of the UOS-RL algorithm based on the priority of the experience data. Simulation results show that the proposed PER-UOS-RL algorithm outperforms the existing works in terms of the computational cost. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
40. The perceptron algorithm with uneven margins based transfer learning for turbofan engine fault detection.
- Author
-
Zhao, Yong-Ping and Cai, Wen
- Subjects
- *
TRANSFER of training , *TURBOFAN engines , *MACHINE learning , *ALGORITHMS , *REINFORCEMENT learning , *KNOWLEDGE transfer , *INFORMATION resources - Abstract
Aeroengine fault detection is an important means to ensure flight safety. The application premise of data driven fault detection method is that all data come from the same distribution. However, this assumption is invalid in the actual engine fault detection, because the engine state will change with the increase of operating time, and the collected data will also have distribution differences. Due to the high cost of collecting engine data, it is difficult to collect enough data from the current state. Fortunately, transfer learning can transfer data information from other fields to the target field, thus alleviating the problem of data scarcity in the target domain. Therefore, from the idea of transfer learning, this paper proposes a cross domain aeroengine fault detection method, viz. transfer learning based on kernel perception algorithm with uneven margins (TL-KPAUM). The proposed method is divided into two stages. In the first stage, KPAUM is trained using the data in the source domain to extract the information in the source domain. In the second stage, the data in the target domain is used to realize the adaptation to the target domain. Compared with several baseline methods, TL-KPAUM has better performance when the amount of data in the target domain is small. Finally, using the simulation data of turbofan engine, the fault detection experiments of degraded state are designed, and the results show that this method is effective. [Display omitted] • A transfer learning algorithm is proposed for aeroengine fault detection. • The proposed method is applied to engine fault detection in different domains. • Simulation results show the effectiveness of this method compared with other methods. • The proposed method has good performance in the case of sparse data in the target domain. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
41. Soft actor-critic DRL algorithm for interval optimal dispatch of integrated energy systems with uncertainty in demand response and renewable energy.
- Author
-
Dong, Yingchao, Zhang, Hongli, Wang, Cong, and Zhou, Xiaojun
- Subjects
- *
DEEP reinforcement learning , *OPTIMIZATION algorithms , *MICROGRIDS , *RENEWABLE energy sources , *ALGORITHMS , *DYNAMICAL systems , *REINFORCEMENT learning - Abstract
The collaborative optimization dispatching of multiple energy flows plays a crucial role in achieving the economic and efficient low-carbon operation of integrated energy systems (IESs). However, the dispatching problem for IESs is characterized by high dimensionality, non-linearity, and complex coupling. Furthermore, the integration of renewable energy sources and flexible loads has transformed the IES into a complex dynamic system with significant uncertainty. Traditional intelligent optimization algorithms exhibit poor adaptability and lengthy solution computation time when tackling such problems. In contrast, deep reinforcement learning (DRL), as an interactive trial-and-error learning method, has shown improved decision-making capabilities. In view of this, a data-driven soft actor-critic (SAC) deep reinforcement learning-based approach is proposed in this paper for interval optimal dispatch of IESs considering multiple uncertainties. First, the basic principle of SAC reinforcement learning is introduced in detail, and the basic framework of reinforcement learning for interval optimal scheduling of IESs is constructed. Then, the environment model of agent interaction is constructed, and the action and state space of DRL, as well as the reward mechanism and neural network structure, are designed. Finally, a typical IES case is experimentally analyzed and compared with three popular DRL algorithms and five state-of-the-art intelligent optimization algorithms. The experimental results demonstrate the advantages and effectiveness of the proposed method in solving the optimal dispatching of IESs. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
42. Reinforcement learning with formal performance metrics for quadcopter attitude control under non-nominal contexts.
- Author
-
Bernini, Nicola, Bessa, Mikhail, Delmas, Rémi, Gold, Arthur, Goubault, Eric, Pennec, Romain, Putot, Sylvie, and Sillion, François
- Subjects
- *
REINFORCEMENT learning , *ALGORITHMS , *LOGIC - Abstract
We explore the reinforcement learning approach to designing controllers by extensively discussing the case of a quadcopter attitude controller. We provide all details allowing to reproduce our approach, starting with a model of the dynamics of a crazyflie 2.0 under various nominal and non-nominal conditions, including partial motor failures and wind gusts. We develop a robust form of a signal temporal logic to quantitatively evaluate the vehicle's behavior and measure the performance of controllers. The paper thoroughly describes the choices in training algorithms, neural net architecture, hyperparameters, observation space in view of the different performance metrics we have introduced. We discuss the robustness of the obtained controllers, both to partial loss of power for one rotor and to wind gusts and finish by drawing conclusions on practical controller design by reinforcement learning. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
43. Collaborative caching relay algorithm based on recursive deep reinforcement learning in mobile vehicle edge network.
- Author
-
Wu, Honghai, Wang, Baibing, Ma, Huahong, and Xing, Ling
- Subjects
DEEP reinforcement learning ,REINFORCEMENT learning ,MOBILE learning ,PARTIALLY observable Markov decision processes ,LONG-term memory ,VEHICLE routing problem ,ALGORITHMS - Abstract
With the rapid development of Internet of vehicles (IoV) and the continuous emergence of vehicle information applications, the demand for content in vehicle networking is growing at an alarming speed. Mobile vehicular edge caching is regarded as a promising technology in improving Quality of Service (QoS) and reducing latency. Many caching algorithms have been proposed, which usually place contents in the Road Side Units (RSUs) to provide services to users near them. However, due to the high-speed movement of vehicles and limitations of RSU coverage, caching interrupts often occur frequently, leading to a deterioration in service quality. To deal with this problem, we make full use of Vehicle-to-Vehicle (V2V) collaboration to construct a caching system which does not require RSU support, and propose a Recursive Deep Reinforcement Learning based Collaborative Caching Relay strategy (RDRL-CR). On purpose to minimize the service delay under capacity constraints, the caching problem is formulated as an integer linear programming problem, and caching decisions are achieved through partially observable Markov Decision Process (MDP). Specifically, this strategy utilizes a Graph Neural Network (GNN) to predict vehicle trajectories, and then selects vehicles that can serve as caching nodes by calculating link stability metrics between vehicles. The Long Short Term Memory (LSTM) network is embedded into a deep deterministic strategy gradient algorithm to achieve the final caching decision. Compared with existing caching strategies, the proposed caching strategy in this paper improves the caching hit rate by about 25% and reduces content access latency by about 20%. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.