Descriptor: "Deep Reinforcement Learning" / Journal: sensors - Searchworks@Jio Institute Digital Library Search Results

1. Z-Score Experience Replay in Off-Policy Deep Reinforcement Learning

Author: Yana Yang, Meng Xi, Huiao Dai, Jiabao Wen, and Jiachen Yang
Subjects: deep reinforcement learning, off policy, priority experience replay, z-score, Chemical technology, TP1-1185
Abstract: Reinforcement learning, as a machine learning method that does not require pre-training data, seeks the optimal policy through the continuous interaction between an agent and its environment. It is an important approach to solving sequential decision-making problems. By combining it with deep learning, deep reinforcement learning possesses powerful perception and decision-making capabilities and has been widely applied to various domains to tackle complex decision problems. Off-policy reinforcement learning separates exploration and exploitation by storing and replaying interaction experiences, making it easier to find global optimal solutions. Understanding how to utilize experiences is crucial for improving the efficiency of off-policy reinforcement learning algorithms. To address this problem, this paper proposes Z-Score Prioritized Experience Replay, which enhances the utilization of experiences and improves the performance and convergence speed of the algorithm. A series of ablation experiments demonstrate that the proposed method significantly improves the effectiveness of deep reinforcement learning algorithms.
Published: 2024
Full Text: View/download PDF

2. Deep Reinforcement Learning-Driven Jamming-Enhanced Secure Unmanned Aerial Vehicle Communications

Author: Zhifang Xing, Yunhui Qin, Changhao Du, Wenzhang Wang, and Zhongshan Zhang
Subjects: unmanned aerial vehicle (UAV), jamming UAV, deep reinforcement learning, sequential decision problem, Chemical technology, TP1-1185
Abstract: Despite its flexibility, unmanned aerial vehicle (UAV) communications are susceptible to eavesdropping due to the open nature of wireless channels and the broadcasting nature of wireless signals. This paper studies secure UAV communications and proposes a method to optimize the minimum secrecy rate of the system by using interference technology to enhance it. To this end, the system not only deploys multiple UAV base stations (BSs) to provide services to legitimate users but also assigns dedicated UAV jammers to send interference signals to active or potential eavesdroppers to disrupt their eavesdropping effectiveness. Based on this configuration, we formulate the optimization process of parameters such as the user association variables, UAV trajectory, and output power as a sequential decision-making problem and use the single-agent soft actor-critic (SAC) algorithm and twin delayed deep deterministic policy gradient (TD3) algorithm to achieve joint optimization of the core parameters. In addition, for specific scenarios, we also use the multi-agent soft actor-critic (MASAC) algorithm to solve the joint optimization problem mentioned above. The numerical results show that the normalized average secrecy rate of the MASAC algorithm increased by more than 6.6% and 14.2% compared with that of the SAC and TD3 algorithms, respectively.
Published: 2024
Full Text: View/download PDF

3. Energy-Saving Multi-Agent Deep Reinforcement Learning Algorithm for Drone Routing Problem

Author: Xiulan Shu, Anping Lin, and Xupeng Wen
Subjects: energy savings, deep reinforcement learning, drone routing, multiple agents, Chemical technology, TP1-1185
Abstract: With the rapid advancement of drone technology, the efficient distribution of drones has garnered significant attention. Central to this discourse is the energy consumption of drones, a critical metric for assessing energy-efficient distribution strategies. Accordingly, this study delves into the energy consumption factors affecting drone distribution. A primary challenge in drone distribution lies in devising optimal, energy-efficient routes for drones. However, traditional routing algorithms, predominantly heuristic-based, exhibit certain limitations. These algorithms often rely on heuristic rules and expert knowledge, which can constrain their ability to escape local optima. Motivated by these shortcomings, we propose a novel multi-agent deep reinforcement learning algorithm that integrates a drone energy consumption model, namely EMADRL. The EMADRL algorithm first formulates the drone routing problem within a multi-agent reinforcement learning framework. It subsequently designs a strategy network model comprising multiple agent networks, tailored to address the node adjacency and masking complexities typical of multi-depot vehicle routing problem. Training utilizes strategy gradient algorithms and attention mechanisms. Furthermore, local and sampling search strategies are introduced to enhance solution quality. Extensive experimentation demonstrates that EMADRL consistently achieves high-quality solutions swiftly. A comparative analysis against contemporary algorithms reveals EMADRL’s superior energy efficiency, with average energy savings of 5.96% and maximum savings reaching 12.45%. Thus, this approach offers a promising new avenue for optimizing energy consumption in last-mile distribution scenarios.
Published: 2024
Full Text: View/download PDF

4. Deep Reinforcement Learning with Local Attention for Single Agile Optical Satellite Scheduling Problem

Author: Zheng Liu, Wei Xiong, Chi Han, and Xiaolan Yu
Subjects: single agile optical satellite scheduling, deep reinforcement learning, local attention, adaptive learning rate, Chemical technology, TP1-1185
Abstract: This paper investigates the single agile optical satellite scheduling problem, which has received increasing attention due to the rapid growth in earth observation requirements. Owing to the complicated constraints and considerable solution space of this problem, the conventional exact methods and heuristic methods, which are sensitive to the problem scale, demand high computational expenses. Thus, an efficient approach is demanded to solve this problem, and this paper proposes a deep reinforcement learning algorithm with a local attention mechanism. A mathematical model is first established to describe this problem, which considers a series of complex constraints and takes the profit ratio of completed tasks as the optimization objective. Then, a neural network framework with an encoder–decoder structure is adopted to generate high-quality solutions, and a local attention mechanism is designed to improve the generation of solutions. In addition, an adaptive learning rate strategy is proposed to guide the actor–critic training algorithm to dynamically adjust the learning rate in the training process to enhance the training effectiveness of the proposed network. Finally, extensive experiments verify that the proposed algorithm outperforms the comparison algorithms in terms of solution quality, generalization performance, and computation efficiency.
Published: 2024
Full Text: View/download PDF

5. Sequence Decision Transformer for Adaptive Traffic Signal Control

Author: Rui Zhao, Haofeng Hu, Yun Li, Yuze Fan, Fei Gao, and Zhenhai Gao
Subjects: adaptive traffic signal control, deep reinforcement learning, Markov decision process, attention mechanism, proximal policy optimization, Chemical technology, TP1-1185
Abstract: Urban traffic congestion poses significant economic and environmental challenges worldwide. To mitigate these issues, Adaptive Traffic Signal Control (ATSC) has emerged as a promising solution. Recent advancements in deep reinforcement learning (DRL) have further enhanced ATSC’s capabilities. This paper introduces a novel DRL-based ATSC approach named the Sequence Decision Transformer (SDT), employing DRL enhanced with attention mechanisms and leveraging the robust capabilities of sequence decision models, akin to those used in advanced natural language processing, adapted here to tackle the complexities of urban traffic management. Firstly, the ATSC problem is modeled as a Markov Decision Process (MDP), with the observation space, action space, and reward function carefully defined. Subsequently, we propose SDT, specifically tailored to solve the MDP problem. The SDT model uses a transformer-based architecture with an encoder and decoder in an actor–critic structure. The encoder processes observations and outputs, both encoded data for the decoder, and value estimates for parameter updates. The decoder, as the policy network, outputs the agent’s actions. Proximal Policy Optimization (PPO) is used to update the policy network based on historical data, enhancing decision-making in ATSC. This approach significantly reduces training times, effectively manages larger observation spaces, captures dynamic changes in traffic conditions more accurately, and enhances traffic throughput. Finally, the SDT model is trained and evaluated in synthetic scenarios by comparing the number of vehicles, average speed, and queue length against three baselines, including PPO, a DQN tailored for ATSC, and FRAP, a state-of-the-art ATSC algorithm. SDT shows improvements of 26.8%, 150%, and 21.7% over traditional ATSC algorithms, and 18%, 30%, and 15.6% over the FRAP. This research underscores the potential of integrating Large Language Models (LLMs) with DRL for traffic management, offering a promising solution to urban congestion.
Published: 2024
Full Text: View/download PDF

6. A Wide-Range TCSC Based ADN in Mountainous Areas Considering Hydropower-Photovoltaic-ESS Complementarity

Author: Yao Guo, Shaorong Wang, and Dezhi Chen
Subjects: TCSC, active distribution network, deep reinforcement learning, optimization operation, Chemical technology, TP1-1185
Abstract: Due to the radial network structures, small cross-sectional lines, and light loads characteristic of existing AC distribution networks in mountainous areas, the development of active distribution networks (ADNs) in these regions has revealed significant issues with integrating distributed generation (DGs) and consuming renewable energy. Focusing on this issue, this paper proposes a wide-range thyristor-controlled series compensation (TCSC)-based ADN and presents a deep reinforcement learning (DRL)-based optimal operation strategy. This strategy takes into account the complementarity of hydropower, photovoltaic (PV) systems, and energy storage systems (ESSs) to enhance the capacity for consuming renewable energy. In the proposed ADN, a wide-range TCSC connects the sub-networks where PV and hydropower systems are located, with ESSs configured for each renewable energy generation. The designed wide-range TCSC allows for power reversal and improves power delivery efficiency, providing conditions for the optimization operation. The optimal operation issue is formulated as a Markov decision process (MDP) with continuous action space and solved using the twin delayed deep deterministic policy gradient (TD3) algorithm. The optimal objective is to maximize the consumption of renewable energy sources (RESs) and minimize line losses by coordinating the charging/discharging of ESSs with the operation mode of the TCSC. The simulation results demonstrate the effectiveness of the proposed method.
Published: 2024
Full Text: View/download PDF

7. Deep Deterministic Policy Gradient-Based Resource Allocation Considering Network Slicing and Device-to-Device Communication in Mobile Networks

Author: Hudson Henrique de Souza Lopes, Lucas Jose Ferreira Lima, Telma Woerle de Lima Soares, and Flávio Henrique Teles Vieira
Subjects: deep reinforcement learning, network slicing, device-to-device, resource allocation, Chemical technology, TP1-1185
Abstract: Next-generation mobile networks, such as those beyond the 5th generation (B5G) and 6th generation (6G), have diverse network resource demands. Network slicing (NS) and device-to-device (D2D) communication have emerged as promising solutions for network operators. NS is a candidate technology for this scenario, where a single network infrastructure is divided into multiple (virtual) slices to meet different service requirements. Combining D2D and NS can improve spectrum utilization, providing better performance and scalability. This paper addresses the challenging problem of dynamic resource allocation with wireless network slices and D2D communications using deep reinforcement learning (DRL) techniques. More specifically, we propose an approach named DDPG-KRP based on deep deterministic policy gradient (DDPG) with K-nearest neighbors (KNNs) and reward penalization (RP) for undesirable action elimination to determine the resource allocation policy maximizing long-term rewards. The simulation results show that the DDPG-KRP is an efficient solution for resource allocation in wireless networks with slicing, outperforming other considered DRL algorithms.
Published: 2024
Full Text: View/download PDF

8. Enhancing the Minimum Awareness Failure Distance in V2X Communications: A Deep Reinforcement Learning Approach

Author: Anthony Kyung Guzmán Leguel, Hoa-Hung Nguyen, David Gómez Gutiérrez, Jinwoo Yoo, and Han-You Jeong
Subjects: vehicle-to-everything (V2X) communications, beaconing, deep reinforcement learning, vehicle awareness, minimum awareness failure distance, Chemical technology, TP1-1185
Abstract: Vehicle-to-everything (V2X) communication is pivotal in enhancing cooperative awareness in vehicular networks. Typically, awareness is viewed as a vehicle’s ability to perceive and share real-time kinematic information. We present a novel definition of awareness in V2X communications, conceptualizing it as a multi-faceted concept involving vehicle detection, tracking, and maintaining their safety distances. To enhance this awareness, we propose a deep reinforcement learning framework for the joint control of beacon rate and transmit power (DRL-JCBRTP). Our DRL−JCBRTP framework integrates LSTM-based actor networks and MLP-based critic networks within the Soft Actor-Critic (SAC) algorithm to effectively learn optimal policies. Leveraging local state information, the DRL-JCBRTP scheme uses an innovative reward function to increase the minimum awareness failure distance. Our SLMLab-Gym-VEINS simulations show that the DRL-JCBRTP scheme outperforms existing beaconing schemes in minimizing awareness failure probability and maximizing awareness distance, ultimately improving driving safety.
Published: 2024
Full Text: View/download PDF

9. Transformer-Based Reinforcement Learning for Multi-Robot Autonomous Exploration

Author: Qihong Chen, Rui Wang, Ming Lyu, and Jie Zhang
Subjects: deep reinforcement learning, robot exploration, artificial neural network, Chemical technology, TP1-1185
Abstract: A map of the environment is the basis for the robot’s navigation. Multi-robot collaborative autonomous exploration allows for rapidly constructing maps of unknown environments, essential for application areas such as search and rescue missions. Traditional autonomous exploration methods are inefficient due to the repetitive exploration problem. For this reason, we propose a multi-robot autonomous exploration method based on the Transformer model. Our multi-agent deep reinforcement learning method includes a multi-agent learning method to effectively improve exploration efficiency. We conducted experiments comparing our proposed method with existing methods in a simulation environment, and the experimental results showed that our proposed method had a good performance and a specific generalization ability.
Published: 2024
Full Text: View/download PDF

10. Deep Reinforcement Learning-Based Adaptive Scheduling for Wireless Time-Sensitive Networking

Author: Hanjin Kim, Young-Jin Kim, and Won-Tae Kim
Subjects: wireless time-sensitive networking, time-sensitive networking, time-aware shaper, deep reinforcement learning, wireless LAN, Chemical technology, TP1-1185
Abstract: Time-sensitive networking (TSN) technologies have garnered attention for supporting time-sensitive communication services, with recent interest extending to the wireless domain. However, adapting TSN to wireless areas faces challenges due to the competitive channel utilization in IEEE 802.11, necessitating exclusive channels for low-latency services. Additionally, traditional TSN scheduling algorithms may cause significant transmission delays due to dynamic wireless characteristics, which must be addressed. This paper proposes a wireless TSN model of IEEE 802.11 networks for the exclusive channel access and a novel time-sensitive traffic scheduler, named the wireless intelligent scheduler (WISE), based on deep reinforcement learning. We designed a deep reinforcement learning (DRL) framework to learn the repetitive transmission patterns of time-sensitive traffic and address potential latency issues from changing wireless conditions. Within this framework, we identified the most suitable DRL model, presenting the WISE algorithm with the best performance. Experimental results indicate that the proposed mechanisms meet up to 99.9% under the various wireless communication scenarios. In addition, they show that the processing delay is successfully limited within the specific time requirements and the scalability of TSN streams is guaranteed by the proposed mechanisms.
Published: 2024
Full Text: View/download PDF

11. Integrating Heuristic Methods with Deep Reinforcement Learning for Online 3D Bin-Packing Optimization

Author: Ching-Chang Wong, Tai-Ting Tsai, and Can-Kun Ou
Subjects: 3D bin-packing, deep reinforcement learning, proximal policy optimization, heuristic algorithms, Chemical technology, TP1-1185
Abstract: This study proposes a method named Hybrid Heuristic Proximal Policy Optimization (HHPPO) to implement online 3D bin-packing tasks. Some heuristic algorithms for bin-packing and the Proximal Policy Optimization (PPO) algorithm of deep reinforcement learning are integrated to implement this method. In the heuristic algorithms for bin-packing, an extreme point priority sorting method is proposed to sort the generated extreme points according to their waste spaces to improve space utilization. In addition, a 3D grid representation of the space status of the container is used, and some partial support constraints are proposed to increase the possibilities for stacking objects and enhance overall space utilization. In the PPO algorithm, some heuristic algorithms are integrated, and the reward function and the action space of the policy network are designed so that the proposed method can effectively complete the online 3D bin-packing task. Some experimental results illustrate that the proposed method has good results in achieving online 3D bin-packing tasks in some simulation environments. In addition, an environment with image vision is constructed to show that the proposed method indeed enables an actual robot manipulator to successfully and effectively complete the bin-packing task in a real environment.
Published: 2024
Full Text: View/download PDF

12. Integrated Quality of Service for Offline and Online Services in Edge Networks via Task Offloading and Service Caching

Author: Chuangqiang Zhan, Shaojie Zheng, Jingyu Chen, Jiachao Liang, and Xiaojie Zhou
Subjects: online, offline, service caching, inference, mobile edge computing, deep reinforcement learning, Chemical technology, TP1-1185
Abstract: Edge servers frequently manage their own offline digital twin (DT) services, in addition to caching online digital twin services. However, current research often overlooks the impact of offline caching services on memory and computation resources, which can hinder the efficiency of online service task processing on edge servers. In this study, we concentrated on service caching and task offloading within a collaborative edge computing system by emphasizing the integrated quality of service (QoS) for both online and offline edge services. We considered the resource usage of both online and offline services, along with incoming online requests. To maximize the overall QoS utility, we established an optimization objective that rewards the throughput of online services while penalizing offline services that miss their soft deadlines. We formulated this as a utility maximization problem, which was proven to be NP-hard. To tackle this complexity, we reframed the optimization problem as a Markov decision process (MDP) and introduced a joint optimization algorithm for service caching and task offloading by leveraging the deep Q-network (DQN). Comprehensive experiments revealed that our algorithm enhanced the utility by at least 14.01% compared with the baseline algorithms.
Published: 2024
Full Text: View/download PDF

13. Optimizing Robotic Mobile Fulfillment Systems for Order Picking Based on Deep Reinforcement Learning

Author: Zhenyi Zhu, Sai Wang, and Tuantuan Wang
Subjects: supply chain management, automatic warehousing system, deep reinforcement learning, order allocation and sequencing, robot collaborative scheduling, robotic mobile fulfillment systems, Chemical technology, TP1-1185
Abstract: Robotic Mobile Fulfillment Systems (RMFSs) face challenges in handling large-scale orders and navigating complex environments, frequently encountering a series of intricate decision-making problems, such as order allocation, shelf selection, and robot scheduling. To address these challenges, this paper integrates Deep Reinforcement Learning (DRL) technology into an RMFS, to meet the needs of efficient order processing and system stability. This study focuses on three key stages of RMFSs: order allocation and sorting, shelf selection, and coordinated robot scheduling. For each stage, mathematical models are established and the corresponding solutions are proposed. Unlike traditional methods, DRL technology is introduced to solve these problems, utilizing a Genetic Algorithm and Ant Colony Optimization to handle decision making related to large-scale orders. Through simulation experiments, performance indicators—such as shelf access frequency and the total processing time of the RMFS—are evaluated. The experimental results demonstrate that, compared to traditional methods, our algorithms excel in handling large-scale orders, showcasing exceptional superiority, capable of completing approximately 110 tasks within an hour. Future research should focus on integrated decision-making modeling for each stage of RMFSs and designing efficient heuristic algorithms for large-scale problems, to further enhance system performance and efficiency.
Published: 2024
Full Text: View/download PDF

14. Dynamic UAV Deployment Scheme Based on Edge Computing for Forest Fire Scenarios

Author: Weihao Zuo and Yongju Xian
Subjects: forest fire, UAV, edge computing, deep reinforcement learning, Chemical technology, TP1-1185
Abstract: This study investigates the dynamic deployment of unmanned aerial vehicles (UAVs) using edge computing in a forest fire scenario. We consider the dynamically changing characteristics of forest fires and the corresponding varying resource requirements. Based on this, this paper models a two-timescale UAV dynamic deployment scheme by considering the dynamic changes in the number and position of UAVs. In the slow timescale, we use a gate recurrent unit (GRU) to predict the number of future users and determine the number of UAVs based on the resource requirements. UAVs with low energy are replaced accordingly. In the fast timescale, a deep-reinforcement-learning-based UAV position deployment algorithm is designed to enable the low-latency processing of computational tasks by adjusting the UAV positions in real time to meet the ground devices’ computational demands. The simulation results demonstrate that the proposed scheme achieves better prediction accuracy. The number and position of UAVs can be adapted to resource demand changes and reduce task execution delays.
Published: 2024
Full Text: View/download PDF

15. Joint Optimization of Age of Information and Energy Consumption in NR-V2X System Based on Deep Reinforcement Learning

Author: Shulin Song, Zheng Zhang, Qiong Wu, Pingyi Fan, and Qiang Fan
Subjects: new radio vehicle-to-everything, deep reinforcement learning, non-orthogonal multiple access, age of information, energy consumption, Chemical technology, TP1-1185
Abstract: As autonomous driving may be the most important application scenario of the next generation, the development of wireless access technologies enabling reliable and low-latency vehicle communication becomes crucial. To address this, 3GPP has developed Vehicle-to-Everything (V2X) specifications based on 5G New Radio (NR) technology, where Mode 2 Side-Link (SL) communication resembles Mode 4 in LTE-V2X, allowing direct communication between vehicles. This supplements SL communication in LTE-V2X and represents the latest advancements in cellular V2X (C-V2X) with the improved performance of NR-V2X. However, in NR-V2X Mode 2, resource collisions still occur and thus degrade the age of information (AOI). Therefore, an interference cancellation method is employed to mitigate this impact by combining NR-V2X with Non-Orthogonal multiple access (NOMA) technology. In NR-V2X, when vehicles select smaller resource reservation intervals (RRIs), higher-frequency transmissions use more energy to reduce AoI. Hence, it is important to jointly considerAoI and communication energy consumption based on NR-V2X communication. Then, we formulate such an optimization problem and employ the Deep Reinforcement Learning (DRL) algorithm to compute the optimal transmission RRI and transmission power for each transmitting vehicle to reduce the energy consumption of each transmitting vehicle and the AoI of each receiving vehicle. Extensive simulations demonstrate the performance of our proposed algorithm.
Published: 2024
Full Text: View/download PDF

16. Towards Robust Decision-Making for Autonomous Highway Driving Based on Safe Reinforcement Learning

Author: Rui Zhao, Ziguo Chen, Yuze Fan, Yun Li, and Fei Gao
Subjects: autonomous driving, importance sampling, catastrophic forgetting, deep reinforcement learning, constrained policy optimization, Chemical technology, TP1-1185
Abstract: Reinforcement Learning (RL) methods are regarded as effective for designing autonomous driving policies. However, even when RL policies are trained to convergence, ensuring their robust safety remains a challenge, particularly in long-tail data. Therefore, decision-making based on RL must adequately consider potential variations in data distribution. This paper presents a framework for highway autonomous driving decisions that prioritizes both safety and robustness. Utilizing the proposed Replay Buffer Constrained Policy Optimization (RECPO) method, this framework updates RL strategies to maximize rewards while ensuring that the policies always remain within safety constraints. We incorporate importance sampling techniques to collect and store data in a Replay buffer during agent operation, allowing the reutilization of data from old policies for training new policy models, thus mitigating potential catastrophic forgetting. Additionally, we transform the highway autonomous driving decision problem into a Constrained Markov Decision Process (CMDP) and apply our proposed RECPO for training, optimizing highway driving policies. Finally, we deploy our method in the CARLA simulation environment and compare its performance in typical highway scenarios against traditional CPO, current advanced strategies based on Deep Deterministic Policy Gradient (DDPG), and IDM + MOBIL (Intelligent Driver Model and the model for minimizing overall braking induced by lane changes). The results show that our framework significantly enhances model convergence speed, safety, and decision-making stability, achieving a zero-collision rate in highway autonomous driving.
Published: 2024
Full Text: View/download PDF

17. Piston Error Automatic Correction for Segmented Mirrors via Deep Reinforcement Learning

Author: Dequan Li, Dong Wang, and Dejie Yan
Subjects: segmented mirrors, deep reinforcement learning, co-phase error, Chemical technology, TP1-1185
Abstract: The segmented mirror co-phase error identification technique based on supervised learning methods has the advantages of simple application conditions, no dependence on custom sensors, a fast calculation speed, and low computing power requirements compared with other methods. However, it is often difficult to obtain a high accuracy in practical application situations with this method because of the difference between the training model and the actual model. The reinforcement learning algorithm does not need to model the real system when operating the system. However, it still retains the advantages of supervised learning. Thus, in this paper, we placed a mask on the pupil plane of the segmented telescope optical system. Moreover, based on the wide spectrum, point spread function, and modulation transfer function of the optical system and deep reinforcement learning—without modeling the optical system—a large-range and high-precision piston error automatic co-phase method with multiple-submirror parallelization was proposed. Finally, we carried out relevant simulation experiments, and the results indicate that the method is effective.
Published: 2024
Full Text: View/download PDF

18. Autonomous Navigation by Mobile Robot with Sensor Fusion Based on Deep Reinforcement Learning

Author: Yang Ou, Yiyi Cai, Youming Sun, and Tuanfa Qin
Subjects: sensor data fusion, deep reinforcement learning, autonomous navigation, mobile robots, Chemical technology, TP1-1185
Abstract: In the domain of mobile robot navigation, conventional path-planning algorithms typically rely on predefined rules and prior map information, which exhibit significant limitations when confronting unknown, intricate environments. With the rapid evolution of artificial intelligence technology, deep reinforcement learning (DRL) algorithms have demonstrated considerable effectiveness across various application scenarios. In this investigation, we introduce a self-exploration and navigation approach based on a deep reinforcement learning framework, aimed at resolving the navigation challenges of mobile robots in unfamiliar environments. Firstly, we fuse data from the robot’s onboard lidar sensors and camera and integrate odometer readings with target coordinates to establish the instantaneous state of the decision environment. Subsequently, a deep neural network processes these composite inputs to generate motion control strategies, which are then integrated into the local planning component of the robot’s navigation stack. Finally, we employ an innovative heuristic function capable of synthesizing map information and global objectives to select the optimal local navigation points, thereby guiding the robot progressively toward its global target point. In practical experiments, our methodology demonstrates superior performance compared to similar navigation methods in complex, unknown environments devoid of predefined map information.
Published: 2024
Full Text: View/download PDF

19. DRL-GAN: A Hybrid Approach for Binary and Multiclass Network Intrusion Detection

Author: Caroline Strickland, Muhammad Zakar, Chandrika Saha, Sareh Soltani Nejad, Noshin Tasnim, Daniel J. Lizotte, and Anwar Haque
Subjects: network security, Network Intrusion Detection System, Deep Reinforcement Learning, Generative Adversarial Networks, NSL-KDD, Machine Learning, Chemical technology, TP1-1185
Abstract: Our increasingly connected world continues to face an ever-growing number of network-based attacks. An Intrusion Detection System (IDS) is an essential security technology used for detecting these attacks. Although numerous Machine Learning-based IDSs have been proposed for the detection of malicious network traffic, the majority have difficulty properly detecting and classifying the more uncommon attack types. In this paper, we implement a novel hybrid technique using synthetic data produced by a Generative Adversarial Network (GAN) to use as input for training a Deep Reinforcement Learning (DRL) model. Our GAN model is trained on the NSL-KDD dataset, a publicly available collection of labeled network traffic data specifically designed to support the evaluation and benchmarking of IDSs. Ultimately, our findings demonstrate that training the DRL model on synthetic datasets generated by specific GAN models can result in better performance in correctly classifying minority classes over training on the true imbalanced dataset.
Published: 2024
Full Text: View/download PDF

20. Minimizing Task Age upon Decision for Low-Latency MEC Networks Task Offloading with Action-Masked Deep Reinforcement Learning

Author: Zhouxi Jiang, Jianfeng Yang, and Xun Gao
Subjects: low-latency mobile edge computing, age upon decision, finite blocklength regime, deep reinforcement learning, maskable proximal policy optimization, Chemical technology, TP1-1185
Abstract: In this paper, we consider a low-latency Mobile Edge Computing (MEC) network where multiple User Equipment (UE) wirelessly reports to a decision-making edge server. At the same time, the transmissions are operated with Finite Blocklength (FBL) codes to achieve low-latency transmission. We introduce the task of Age upon Decision (AuD) aimed at the timeliness of tasks used for decision-making, which highlights the timeliness of the information at decision-making moments. For the case in which dynamic task generation and random fading channels are considered, we provide a task AuD minimization design by jointly selecting UE and allocating blocklength. In particular, to solve the task AuD minimization problem, we transform the optimization problem to a Markov Decision Process problem and propose an Error Probability-Controlled Action-Masked Proximal Policy Optimization (EMPPO) algorithm. Via simulation, we show that the proposed design achieves a lower AuD than baseline methods across various network conditions, especially in scenarios with significant channel Signal-to-Noise Ratio (SNR) differences and low average SNR, which shows the robustness of EMPPO and its potential for real-time applications.
Published: 2024
Full Text: View/download PDF

21. Deep-Reinforcement-Learning-Based Joint Energy Replenishment and Data Collection Scheme for WRSN

Author: Jishan Li, Zhichao Deng, Yong Feng, and Nianbo Liu
Subjects: wireless rechargeable sensor networks, unmanned aerial vehicles, deep reinforcement learning, route protocol, Chemical technology, TP1-1185
Abstract: With the emergence of wireless rechargeable sensor networks (WRSNs), the possibility of wirelessly recharging nodes using mobile charging vehicles (MCVs) has become a reality. However, existing approaches overlook the effective integration of node energy replenishment and mobile data collection processes. In this paper, we propose a joint energy replenishment and data collection scheme (D-JERDG) for WRSNs based on deep reinforcement learning. By capitalizing on the high mobility of unmanned aerial vehicles (UAVs), D-JERDG enables continuous visits to the cluster head nodes in each cluster, facilitating data collection and range-based charging. First, D-JERDG utilizes the K-means algorithm to partition the network into multiple clusters, and a cluster head selection algorithm is proposed based on an improved dynamic routing protocol, which elects cluster head nodes based on the remaining energy and geographical location of the cluster member nodes. Afterward, the simulated annealing (SA) algorithm determines the shortest flight path. Subsequently, the DRL model multiobjective deep deterministic policy gradient (MODDPG) is employed to control and optimize the UAV instantaneous heading and speed, effectively planning UAV hover points. By redesigning the reward function, joint optimization of multiple objectives such as node death rate, UAV throughput, and average flight energy consumption is achieved. Extensive simulation results show that the proposed D-JERDG achieves joint optimization of multiple objectives and exhibits significant advantages over the baseline in terms of throughput, time utilization, and charging cost, among other indicators.
Published: 2024
Full Text: View/download PDF

22. Robust Offloading for Edge Computing-Assisted Sensing and Communication Systems: A Deep Reinforcement Learning Approach

Author: Li Shen, Bin Li, and Xiaojie Zhu
Subjects: integrated communication and sensing, mobile edge computing, deep reinforcement learning, robust design, computation uncertainty, Chemical technology, TP1-1185
Abstract: In this paper, we consider an integrated sensing, communication, and computation (ISCC) system to alleviate the spectrum congestion and computation burden problem. Specifically, while serving communication users, a base station (BS) actively engages in sensing targets and collaborates seamlessly with the edge server to concurrently process the acquired sensing data for efficient target recognition. A significant challenge in edge computing systems arises from the inherent uncertainty in computations, mainly stemming from the unpredictable complexity of tasks. With this consideration, we address the computation uncertainty by formulating a robust communication and computing resource allocation problem in ISCC systems. The primary goal of the system is to minimize total energy consumption while adhering to perception and delay constraints. This is achieved through the optimization of transmit beamforming, offloading ratio, and computing resource allocation, effectively managing the trade-offs between local execution and edge computing. To overcome this challenge, we employ a Markov decision process (MDP) in conjunction with the proximal policy optimization (PPO) algorithm, establishing an adaptive learning strategy. The proposed algorithm stands out for its rapid training speed, ensuring compliance with latency requirements for perception and computation in applications. Simulation results highlight its robustness and effectiveness within ISCC systems compared to baseline approaches.
Published: 2024
Full Text: View/download PDF

23. Inspection Robot Navigation Based on Improved TD3 Algorithm

Author: Bo Huang, Jiacheng Xie, and Jiawei Yan
Subjects: inspection robot navigation, deep reinforcement learning, long- and short-term memory, curiosity-driven, Chemical technology, TP1-1185
Abstract: The swift advancements in robotics have rendered navigation an essential task for mobile robots. While map-based navigation methods depend on global environmental maps for decision-making, their efficacy in unfamiliar or dynamic settings falls short. Current deep reinforcement learning navigation strategies can navigate successfully without pre-existing map data, yet they grapple with issues like inefficient training, slow convergence, and infrequent rewards. To tackle these challenges, this study introduces an improved two-delay depth deterministic policy gradient algorithm (LP-TD3) for local planning navigation. Initially, the integration of the long–short-term memory (LSTM) module with the Prioritized Experience Re-play (PER) mechanism into the existing TD3 framework was performed to optimize training and improve the efficiency of experience data utilization. Furthermore, the incorporation of an Intrinsic Curiosity Module (ICM) merges intrinsic with extrinsic rewards to tackle sparse reward problems and enhance exploratory behavior. Experimental evaluations using ROS and Gazebo simulators demonstrate that the proposed method outperforms the original on various performance metrics.
Published: 2024
Full Text: View/download PDF

24. Adaptive Cruise Control Based on Safe Deep Reinforcement Learning

Author: Rui Zhao, Kui Wang, Wenbo Che, Yun Li, Yuze Fan, and Fei Gao
Subjects: autonomous driving, adaptive cruise control, safety aware, deep reinforcement learning, projected constrained policy optimization, Chemical technology, TP1-1185
Abstract: Adaptive cruise control (ACC) enables efficient, safe, and intelligent vehicle control by autonomously adjusting speed and ensuring a safe following distance from the vehicle in front. This paper proposes a novel adaptive cruise system, namely the Safety-First Reinforcement Learning Adaptive Cruise Control (SFRL-ACC). This system aims to leverage the model-free nature and high real-time inference efficiency of Deep Reinforcement Learning (DRL) to overcome the challenges of modeling difficulties and lower computational efficiency faced by current optimization control-based ACC methods while simultaneously maintaining safety advantages and optimizing ride comfort. Firstly, we transform the ACC problem into a safe DRL formulation Constrained Markov Decision Process (CMDP) by carefully designing state, action, reward, and cost functions. Subsequently, we propose the Projected Constrained Policy Optimization (PCPO)-based ACC Algorithm SFRL-ACC, which is specifically tailored to solve the CMDP problem. PCPO incorporates safety constraints that further restrict the trust region formed by the Kullback–Leibler (KL) divergence, facilitating DRL policy updates that maximize performance while keeping safety costs within their limit bounds. Finally, we train an SFRL-ACC policy and compare its computation time, traffic efficiency, ride comfort, and safety with state-of-the-art MPC-based ACC control methods. The experimental results prove the superiority of the proposed method in the aforementioned performance aspects.
Published: 2024
Full Text: View/download PDF

25. Task Offloading Strategy for Unmanned Aerial Vehicle Power Inspection Based on Deep Reinforcement Learning

Author: Wei Zhuang, Fanan Xing, and Yuhang Lu
Subjects: multiple UAVs, mobile edge computing, deep reinforcement learning, power inspection, offloading strategy, electric power IoT, Chemical technology, TP1-1185
Abstract: With the ongoing advancement of electric power Internet of Things (IoT), traditional power inspection methods face challenges such as low efficiency and high risk. Unmanned aerial vehicles (UAVs) have emerged as a more efficient solution for inspecting power facilities due to their high maneuverability, excellent line-of-sight communication capabilities, and strong adaptability. However, UAVs typically grapple with limited computational power and energy resources, which constrain their effectiveness in handling computationally intensive and latency-sensitive inspection tasks. In response to this issue, we propose a UAV task offloading strategy based on deep reinforcement learning (DRL), which is designed for power inspection scenarios consisting of mobile edge computing (MEC) servers and multiple UAVs. Firstly, we propose an innovative UAV-Edge server collaborative computing architecture to fully exploit the mobility of UAVs and the high-performance computing capabilities of MEC servers. Secondly, we established a computational model concerning energy consumption and task processing latency in the UAV power inspection system, enhancing our understanding of the trade-offs involved in UAV offloading strategies. Finally, we formalize the task offloading problem as a multi-objective optimization issue and simultaneously model it as a Markov Decision Process (MDP). Subsequently, we proposed a task offloading algorithm based on a Deep Deterministic Policy Gradient (OTDDPG) to obtain the optimal task offloading strategy for UAVs. The simulation results demonstrated that this approach outperforms baseline methods with significant improvements in task processing latency and energy consumption.
Published: 2024
Full Text: View/download PDF

26. Multi-User Computation Offloading and Resource Allocation Algorithm in a Vehicular Edge Network

Author: Xiangyan Liu, Jianhong Zheng, Meng Zhang, Yang Li, Rui Wang, and Yun He
Subjects: Vehicular Edge Computing Network (VECN), computation offloading, resource allocation, deep reinforcement learning, Chemical technology, TP1-1185
Abstract: In Vehicular Edge Computing Network (VECN) scenarios, the mobility of vehicles causes the uncertainty of channel state information, which makes it difficult to guarantee the Quality of Service (QoS) in the process of computation offloading and the resource allocation of a Vehicular Edge Computing Server (VECS). A multi-user computation offloading and resource allocation optimization model and a computation offloading and resource allocation algorithm based on the Deep Deterministic Policy Gradient (DDPG) are proposed to address this problem. Firstly, the problem is modeled as a Mixed Integer Nonlinear Programming (MINLP) problem according to the optimization objective of minimizing the total system delay. Then, in response to the large state space and the coexistence of discrete and continuous variables in the action space, a reinforcement learning algorithm based on DDPG is proposed. Finally, the proposed method is used to solve the problem and compared with the other three benchmark schemes. Compared with the baseline algorithms, the proposed scheme can effectively select the task offloading mode and reasonably allocate VECS computing resources, ensure the QoS of task execution, and have a certain stability and scalability. Simulation results show that the total completion time of the proposed scheme can be reduced by 24–29% compared with the existing state-of-the-art techniques.
Published: 2024
Full Text: View/download PDF

27. Deep Reinforcement Learning-Based Resource Management in Maritime Communication Systems

Author: Xi Yao, Yingdong Hu, Yicheng Xu, and Ruifeng Gao
Subjects: deep reinforcement learning, beam allocation scheme, deep Q-network, Chemical technology, TP1-1185
Abstract: With the growing maritime economy, ensuring the quality of communication for maritime users has become imperative. The maritime communication system based on nearshore base stations enhances the communication rate of maritime users through dynamic resource allocation. A virtual queue-based deep reinforcement learning beam allocation scheme is proposed in this paper, aiming to maximize the communication rate. More particularly, to reduce the complexity of resource management, we employ a grid-based method to discretize the maritime environment. For the combinatorial optimization problem of grid and beam allocation under unknown channel state information, we model it as a sequential decision process of resource allocation. The nearshore base station is modeled as a learning agent, continuously interacting with the environment to optimize beam allocation schemes using deep reinforcement learning techniques. Furthermore, we guarantee that grids with poor channel state information can be serviced through the virtual queue method. Finally, the simulation results provided show that our proposed beam allocation scheme is beneficial in terms of increasing the communication rate.
Published: 2024
Full Text: View/download PDF

28. Dynamic Intelligent Scheduling in Low-Carbon Heterogeneous Distributed Flexible Job Shops with Job Insertions and Transfers

Author: Yi Chen, Xiaojuan Liao, Guangzhu Chen, and Yingjie Hou
Subjects: heterogeneous distributed flexible job shop, dynamic scheduling, low-carbon, deep reinforcement learning, Rainbow DQN, Chemical technology, TP1-1185
Abstract: With the rapid development of economic globalization and green manufacturing, traditional flexible job shop scheduling has evolved into the low-carbon heterogeneous distributed flexible job shop scheduling problem (LHDFJSP). Additionally, modern smart manufacturing processes encounter complex and diverse contingencies, necessitating the ability to address dynamic events in real-world production activities. To date, there are limited studies that comprehensively address the intricate factors associated with the LHDFJSP, including workshop heterogeneity, job insertions and transfers, and considerations of low-carbon objectives. This paper establishes a multi-objective mathematical model with the goal of minimizing the total weighted tardiness and total energy consumption. To effectively solve this problem, diverse composite scheduling rules are formulated, alongside the application of a deep reinforcement learning (DRL) framework, i.e., Rainbow deep-Q network (Rainbow DQN), to learn the optimal scheduling strategy at each decision point in a dynamic environment. To verify the effectiveness of the proposed method, this paper extends the standard dataset to adapt to the LHDFJSP. Evaluation results confirm the generalization and robustness of the presented Rainbow DQN-based method.
Published: 2024
Full Text: View/download PDF

29. Deep Reinforcement Learning-Based Energy Consumption Optimization for Peer-to-Peer (P2P) Communication in Wireless Sensor Networks

Author: Jinyu Yuan, Jingyi Peng, Qing Yan, Gang He, Honglin Xiang, and Zili Liu
Subjects: wireless sensor networks, peer-to-peer communication, energy consumption, power control, deep reinforcement learning, Chemical technology, TP1-1185
Abstract: The fast development of the sensors in the wireless sensor networks (WSN) brings a big challenge of low energy consumption requirements, and Peer-to-peer (P2P) communication becomes the important way to break this bottleneck. However, the interference caused by different sensors sharing the spectrum and the power limitations seriously constrains the improvement of WSN. Therefore, in this paper, we proposed a deep reinforcement learning-based energy consumption optimization for P2P communication in WSN. Specifically, P2P sensors (PUs) are considered agents to share the spectrum of authorized sensors (AUs). An authorized sensor has permission to access specific data or systems, while a P2P sensor directly communicates with other sensors without needing a central server. One involves permission, the other is direct communication between sensors. Each agent can control the power and select the resources to avoid interference. Moreover, we use a double deep Q network (DDQN) algorithm to help the agent learn more detailed features of the interference. Simulation results show that the proposed algorithm can obtain a higher performance than the deep Q network scheme and the traditional algorithm, which can effectively lower the energy consumption for P2P communication in WSN.
Published: 2024
Full Text: View/download PDF

30. Learn to Bet: Using Reinforcement Learning to Improve Vehicle Bids in Auction-Based Smart Intersections

Author: Giacomo Cabri, Matteo Lugli, Manuela Montangero, and Filippo Muzzini
Subjects: deep reinforcement learning, smart city, intersection management, auctions, connected vehicles, autonomous vehicles, Chemical technology, TP1-1185
Abstract: With the advent of IoT, cities will soon be populated by autonomous vehicles and managed by intelligent systems capable of actively interacting with city infrastructures and vehicles. In this work, we propose a model based on reinforcement learning that teaches to autonomous connected vehicles how to save resources while navigating in such an environment. In particular, we focus on budget savings in the context of auction-based intersection management systems. We trained several models with Deep Q-learning by varying traffic conditions to find the most performance-effective variant in terms of the trade-off between saved currency and trip times. Afterward, we compared the performance of our model with previously proposed and random strategies, even under adverse traffic conditions. Our model appears to be robust and manages to save a considerable amount of currency without significantly increasing the waiting time in traffic. For example, the learner bidder saves at least 20% of its budget with heavy traffic conditions and up to 74% in lighter traffic with respect to a standard bidder, and around three times the saving of a random bidder. The results and discussion suggest practical adoption of the proposal in a foreseen future real-life scenario.
Published: 2024
Full Text: View/download PDF

31. GA-Dueling DQN Jamming Decision-Making Method for Intra-Pulse Frequency Agile Radar

Author: Liqun Xia, Lulu Wang, Zhidong Xie, and Xin Gao
Subjects: optimizing jamming strategies, jamming-to-noise ratio, deep reinforcement learning, Chemical technology, TP1-1185
Abstract: Optimizing jamming strategies is crucial for enhancing the performance of cognitive jamming systems in dynamic electromagnetic environments. The emergence of frequency-agile radars, capable of changing the carrier frequency within or between pulses, poses significant challenges for the jammer to make intelligent decisions and adapt to the dynamic environment. This paper focuses on researching intelligent jamming decision-making algorithms for Intra-Pulse Frequency Agile Radar using deep reinforcement learning. Intra-Pulse Frequency Agile Radar achieves frequency agility at the sub-pulse level, creating a significant frequency agility space. This presents challenges for traditional jamming decision-making methods to rapidly learn its changing patterns through interactions. By employing Gated Recurrent Units (GRU) to capture long-term dependencies in sequence data, together with the attention mechanism, this paper proposes a GA-Dueling DQN (GRU-Attention-based Dueling Deep Q Network) method for jamming frequency selection. Simulation results indicate that the proposed method outperforms traditional Q-learning, DQN, and Dueling DQN methods in terms of jamming effectiveness. It exhibits the fastest convergence speed and reduced reliance on prior knowledge, highlighting its significant advantages in jamming the subpulse-level frequency-agile radar.
Published: 2024
Full Text: View/download PDF

32. Coordinated Decision Control of Lane-Change and Car-Following for Intelligent Vehicle Based on Time Series Prediction and Deep Reinforcement Learning

Author: Kun Zhang, Tonglin Pu, Qianxi Zhang, and Zhigen Nie
Subjects: intelligent vehicles, time series prediction, deep reinforcement learning, lane-change and car-following, condition identification, trajectory planning, Chemical technology, TP1-1185
Abstract: Adaptive cruise control and autonomous lane-change systems represent pivotal advancements in intelligent vehicle technology. To enhance the operational efficiency of intelligent vehicles in combined lane-change and car-following scenarios, we propose a coordinated decision control model based on hierarchical time series prediction and deep reinforcement learning under the influence of multiple surrounding vehicles. Firstly, we analyze the lane-change behavior and establish boundary conditions for safe lane-change, and divide the lane-change trajectory planning problem into longitudinal velocity planning and lateral trajectory planning. LSTM network is introduced to predict the driving states of surrounding vehicles in multi-step time series, combining D3QN algorithm to make decisions on lane-change behavior. Secondly, based on the following state between the ego vehicle and the leader vehicle in the initial lane, as well as the relationship between the initial distance and the expected distance with the leader vehicle in the target lane, with the primary objective of maximizing driving efficiency, longitudinal velocity is planned based on driving conditions recognition. The lateral trajectory and conditions recognition are then planned using the GA-LSTM-BP algorithm. In contrast to conventional adaptive cruise control systems, the DDPG algorithm serves as the lower-level control model for car-following, enabling continuous velocity control. The proposed model is subsequently simulated and validated using the NGSIM dataset and a lane-change scenarios dataset. The results demonstrate that the algorithm facilitates intelligent vehicle lane-change and car-following coordinated control while ensuring safety and stability during lane-changes. Comparative analysis with other decision control models reveals a notable 17.58% increase in driving velocity, underscoring the algorithm’s effectiveness in improving driving efficiency.
Published: 2024
Full Text: View/download PDF

33. Path Following for Autonomous Mobile Robots with Deep Reinforcement Learning

Author: Yu Cao, Kan Ni, Takahiro Kawaguchi, and Seiji Hashimoto
Subjects: autonomous mobile robot, path following, velocity control, deep reinforcement learning, soft actor-critic, Chemical technology, TP1-1185
Abstract: Autonomous mobile robots have become integral to daily life, providing crucial services across diverse domains. This paper focuses on path following, a fundamental technology and critical element in achieving autonomous mobility. Existing methods predominantly address tracking through steering control, neglecting velocity control or relying on path-specific reference velocities, thereby constraining their generality. In this paper, we propose a novel approach that integrates the conventional pure pursuit algorithm with deep reinforcement learning for a nonholonomic mobile robot. Our methodology employs pure pursuit for steering control and utilizes the soft actor-critic algorithm to train a velocity control strategy within randomly generated path environments. Through simulation and experimental validation, our approach exhibits notable advancements in path convergence and adaptive velocity adjustments to accommodate paths with varying curvatures. Furthermore, this method holds the potential for broader applicability to vehicles adhering to nonholonomic constraints beyond the specific model examined in this paper. In summary, our study contributes to the progression of autonomous mobility by harmonizing conventional algorithms with cutting-edge deep reinforcement learning techniques, enhancing the robustness of path following.
Published: 2024
Full Text: View/download PDF

34. Deep Reinforcement Learning-Based Power Allocation for Minimizing Age of Information and Energy Consumption in Multi-Input Multi-Output and Non-Orthogonal Multiple Access Internet of Things Systems

Author: Qiong Wu, Zheng Zhang, Hongbiao Zhu, Pingyi Fan, Qiang Fan, Huiling Zhu, and Jiangzhou Wang
Subjects: deep reinforcement learning, age of information, MIMO-NOMA, Internet of Things, Chemical technology, TP1-1185
Abstract: Multi-input multi-output and non-orthogonal multiple access (MIMO-NOMA) Internet-of-Things (IoT) systems can improve channel capacity and spectrum efficiency distinctly to support real-time applications. Age of information (AoI) plays a crucial role in real-time applications as it determines the timeliness of the extracted information. In MIMO-NOMA IoT systems, the base station (BS) determines the sample collection commands and allocates the transmit power for each IoT device. Each device determines whether to sample data according to the sample collection commands and adopts the allocated power to transmit the sampled data to the BS over the MIMO-NOMA channel. Afterwards, the BS employs the successive interference cancellation (SIC) technique to decode the signal of the data transmitted by each device. The sample collection commands and power allocation may affect the AoI and energy consumption of the system. Optimizing the sample collection commands and power allocation is essential for minimizing both AoI and energy consumption in MIMO-NOMA IoT systems. In this paper, we propose the optimal power allocation to achieve it based on deep reinforcement learning (DRL). Simulations have demonstrated that the optimal power allocation effectively achieves lower AoI and energy consumption compared to other algorithms. Overall, the reward is reduced by 6.44% and 11.78% compared the to GA algorithm and random algorithm, respectively.
Published: 2023
Full Text: View/download PDF

35. Intelligent Vehicle Decision-Making and Trajectory Planning Method Based on Deep Reinforcement Learning in the Frenet Space

Author: Jiawei Wang, Liang Chu, Yao Zhang, Yabin Mao, and Chong Guo
Subjects: autonomous driving, deep reinforcement learning, behavior decision making, trajectory planning, Chemical technology, TP1-1185
Abstract: The complexity inherent in navigating intricate traffic environments poses substantial hurdles for intelligent driving technology. The continual progress in mapping and sensor technologies has equipped vehicles with the capability to intricately perceive their exact position and the intricate interplay among surrounding traffic elements. Building upon this foundation, this paper introduces a deep reinforcement learning method to solve the decision-making and trajectory planning problem of intelligent vehicles. The method employs a deep learning framework for feature extraction, utilizing a grid map generated from a blend of static environmental markers such as road centerlines and lane demarcations, in addition to dynamic environmental cues including vehicle positions across varied lanes, all harmonized within the Frenet coordinate system. The grid map serves as the input for the state space, and the input for the action space comprises a vector encompassing lane change timing, velocity, and vertical displacement at the lane change endpoint. To optimize the action strategy, a reinforcement learning approach is employed. The feasibility, stability, and efficiency of the proposed method are substantiated via experiments conducted in the CARLA simulator across diverse driving scenarios, and the proposed method can increase the average success rate of lane change by 6.8% and 13.1% compared with the traditional planning control algorithm and the simple reinforcement learning method.
Published: 2023
Full Text: View/download PDF

36. Joint Task Offloading and Resource Allocation for Intelligent Reflecting Surface-Aided Integrated Sensing and Communication Systems Using Deep Reinforcement Learning Algorithm

Author: Liu Yang, Yifei Wei, and Xiaojun Wang
Subjects: integrated sensing and communication, intelligent reflecting surface, deep reinforcement learning, resource allocation, Chemical technology, TP1-1185
Abstract: This paper investigates an intelligent reflecting surface (IRS)-aided integrated sensing and communication (ISAC) framework to cope with the problem of spectrum scarcity and poor wireless environment. The main goal of the proposed framework in this work is to optimize the overall performance of the system, including sensing, communication, and computational offloading. We aim to achieve the trade-off between system performance and overhead by optimizing spectrum and computing resource allocation. On the one hand, the joint design of transmit beamforming and phase shift matrices can enhance the radar sensing quality and increase the communication data rate. On the other hand, task offloading and computation resource allocation optimize energy consumption and delay. Due to the coupled and high dimension optimization variables, the optimization problem is non-convex and NP-hard. Meanwhile, given the dynamic wireless channel condition, we formulate the optimization design as a Markov decision process. To tackle this complex optimization problem, we proposed two innovative deep reinforcement learning (DRL)-based schemes. Specifically, a deep deterministic policy gradient (DDPG) method is proposed to address the continuous high-dimensional action space, and the prioritized experience replay is adopted to speed up the convergence process. Then, a twin delayed DDPG algorithm is designed based on this DRL framework. Numerical results confirm the effectiveness of proposed schemes compared with the benchmark methods.
Published: 2023
Full Text: View/download PDF

37. Enhancing Short Track Speed Skating Performance through Improved DDQN Tactical Decision Model

Author: Yuanbo Yang, Feimo Li, and Hongxing Chang
Subjects: short track speed skating, deep reinforcement learning, decision-making method, deep Q-network, competition performance improvement, Chemical technology, TP1-1185
Abstract: This paper studies the tactical decision-making model of short track speed skating based on deep reinforcement learning, so as to improve the competitive performance of corresponding short track speed skaters. Short track speed skating, a traditional discipline in the Winter Olympics since its establishment in 1988, has consistently garnered attention. As artificial intelligence continues to advance, the utilization of deep learning methods to enhance athletes’ tactical decision-making capabilities has become increasingly prevalent. Traditional tactical decision techniques often rely on the experience and knowledge of coaches and video analysis methods that require a lot of time and effort. Consequently, this study proposes a scientific simulation environment for short track speed skating, that accurately simulates the physical attributes of the venue, the physiological fitness of the athletes, and the rules of the competition. The Double Deep Q-Network (DDQN) model is enhanced and utilized, with improvements to the reward function and the distinct description of four tactics. This enables agents to learn optimal tactical decisions in various competitive states with a simulation environment. Experimental results demonstrate that this approach effectively enhances the competition performance and physiological fitness allocation of short track speed skaters.
Published: 2023
Full Text: View/download PDF

38. CMID: Crossmodal Image Denoising via Pixel-Wise Deep Reinforcement Learning

Author: Yi Guo, Yuanhang Gao, Bingliang Hu, Xueming Qian, and Dong Liang
Subjects: deep reinforcement learning, pixel-wise image processing, crossmodal image denoising, infrared image denoising, terahertz image denoising, Chemical technology, TP1-1185
Abstract: Removing noise from acquired images is a crucial step in various image processing and computer vision tasks. However, the existing methods primarily focus on removing specific noise and ignore the ability to work across modalities, resulting in limited generalization performance. Inspired by the iterative procedure of image processing used by professionals, we propose a pixel-wise crossmodal image-denoising method based on deep reinforcement learning to effectively handle noise across modalities. We proposed a similarity reward to help teach an optimal action sequence to model the step-wise nature of the human processing process explicitly. In addition, We designed an action set capable of handling multiple types of noise to construct the action space, thereby achieving successful crossmodal denoising. Extensive experiments against state-of-the-art methods on publicly available RGB, infrared, and terahertz datasets demonstrate the superiority of our method in crossmodal image denoising.
Published: 2023
Full Text: View/download PDF

39. Joint Beamforming and Phase Shifts Design for RIS-Aided Multi-User Full-Duplex Systems in Smart Cities

Author: Kunbei Pan, Bin Zhou, Wei Zhang, and Cheng Ju
Subjects: full-duplex, reconfigurable intelligent surface, spectral efficiency, beamforming, phase shifts, deep reinforcement learning, Chemical technology, TP1-1185
Abstract: Full-duplex (FD) and reconfigurable intelligent surface (RIS) are potential technologies for achieving wireless communication effectively. Therefore, in theory, the RIS-aided FD system is supposed to enhance spectral efficiency significantly for the ubiquitous Internet of Things devices in smart cities. However, this technology additionally induces the loop-interference (LI) of RIS on the residual self-interference (SI) of the FD base station, especially in complicated urban outdoor environments, which will somewhat counterbalance the performance benefit. Inspired by this, we first establish an objective and constraints considering the residual SI and LI in two typical urban outdoor scenarios. Then, we decompose the original problem into two subproblems according to the variable types and jointly design the beamforming matrices and phase shifts vector methods. Specifically, we propose a successive convex approximation algorithm and a soft actor–critic deep reinforcement learning-related scheme to solve the subproblems alternately. To prove the effectiveness of our proposal, we introduce benchmarks of RIS phase shifts design for comparison. The simulation results show that the performance of the low-complexity proposed algorithm is only slightly lower than the exhaustive search method and outperforms the fixed-point iteration scheme. Moreover, the proposal in scenario two is more outstanding, demonstrating the application predominance in urban outdoor environments.
Published: 2023
Full Text: View/download PDF

40. EvacuAI: An Analysis of Escape Routes in Indoor Environments with the Aid of Reinforcement Learning

Author: Anna Carolina Rosa, Mariana Cabral Falqueiro, Rodrigo Bonacin, Fábio Lúcio Lopes de Mendonça, Geraldo Pereira Rocha Filho, and Vinícius Pereira Gonçalves
Subjects: machine learning, deep reinforcement learning, transfer learning, fire, evacuation, Chemical technology, TP1-1185
Abstract: There is only a very short reaction time for people to find the best way out of a building in a fire outbreak. Software applications can be used to assist the rapid evacuation of people from the building; however, this is an arduous task, which requires an understanding of advanced technologies. Since well-known pathway algorithms (such as, Dijkstra, Bellman–Ford, and A*) can lead to serious performance problems, when it comes to multi-objective problems, we decided to make use of deep reinforcement learning techniques. A wide range of strategies including a random initialization of replay buffer and transfer learning were assessed in three projects involving schools of different sizes. The results showed the proposal was viable and that in most cases the performance of transfer learning was superior, enabling the learning agent to be trained in times shorter than 1 min, with 100% accuracy in the routes. In addition, the study raised challenges that had to be faced in the future.
Published: 2023
Full Text: View/download PDF

41. A Deep Reinforcement Learning Approach to Droplet Routing for Erroneous Digital Microfluidic Biochips

Author: Tomohisa Kawakami, Chiharu Shiro, Hiroki Nishikawa, Xiangbo Kong, Hiroyuki Tomiyama, and Shigeru Yamashita
Subjects: biochips, digital microfluidic biochips, deep reinforcement learning, optimization, Chemical technology, TP1-1185
Abstract: Digital microfluidic biochips (DMFBs), which are used in various fields like DNA analysis, clinical diagnosis, and PCR testing, have made biochemical experiments more compact, efficient, and user-friendly than the previous methods. However, their reliability is often compromised by their inability to adapt to all kinds of errors. Errors in biochips can be categorized into two types: known errors, and unknown errors. Known errors are detectable before the start of the routing process using sensors or cameras. Unknown errors, in contrast, only become apparent during the routing process and remain undetected by sensors or cameras, which can unexpectedly stop the routing process and diminish the reliability of biochips. This paper introduces a deep reinforcement learning-based routing algorithm, designed to manage not only known errors but also unknown errors. Our experiments demonstrated that our algorithm outperformed the previous ones in terms of the success rate of the routing, in the scenarios including both known errors and unknown errors. Additionally, our algorithm contributed to detecting unknown errors during the routing process, identifying the most efficient routing path with a high probability.
Published: 2023
Full Text: View/download PDF

42. An Optimization Method for Non-IID Federated Learning Based on Deep Reinforcement Learning

Author: Xutao Meng, Yong Li, Jianchao Lu, and Xianglin Ren
Subjects: federated learning, deep reinforcement learning, non-IID, client selection, Chemical technology, TP1-1185
Abstract: Federated learning (FL) is a distributed machine learning paradigm that enables a large number of clients to collaboratively train models without sharing data. However, when the private dataset between clients is not independent and identically distributed (non-IID), the local training objective is inconsistent with the global training objective, which possibly causes the convergence speed of FL to slow down, or even not converge. In this paper, we design a novel FL framework based on deep reinforcement learning (DRL), named FedRLCS. In FedRLCS, we primarily improved the greedy strategy and action space of the double DQN (DDQN) algorithm, enabling the server to select the optimal subset of clients from a non-IID dataset to participate in training, thereby accelerating model convergence and reaching the target accuracy in fewer communication epochs. In simulation experiments, we partition multiple datasets with different strategies to simulate non-IID on local clients. We adopt four models (LeNet-5, MobileNetV2, ResNet-18, ResNet-34) on the four datasets (CIFAR-10, CIFAR-100, NICO, Tiny ImageNet), respectively, and conduct comparative experiments with five state-of-the-art non-IID FL methods. Experimental results show that FedRLCS reduces the number of communication rounds required by 10–70% with the same target accuracy without increasing the computation and storage costs for all clients.
Published: 2023
Full Text: View/download PDF

43. Multivariable Coupled System Control Method Based on Deep Reinforcement Learning

Author: Jin Xu, Han Li, and Qingxin Zhang
Subjects: multivariate coupled system, deep reinforcement learning, control system, PPO, normalization, Chemical technology, TP1-1185
Abstract: Due to the multi-loop coupling characteristics of multivariable systems, it is difficult for traditional control methods to achieve precise control effects. Therefore, this paper proposes a control method based on deep reinforcement learning to achieve stable and accurate control of multivariable coupling systems. Based on the proximal policy optimization algorithm (PPO), this method selects tanh as the activation function and normalizes the advantage function. At the same time, based on the characteristics of the multivariable coupling system, the reward function and controller are redesigned structures, achieving stable and precise control of the controlled system. In addition, this study used the amplitude of the control quantity output by the controller as an indicator to evaluate the controller’s performance. Finally, simulation verification was conducted in MATLAB/Simulink. The experimental results show that compared with decentralized control, decoupled control and traditional PPO control, the method proposed in this article achieves better control effects.
Published: 2023
Full Text: View/download PDF

44. Path Planning for Unmanned Surface Vehicles with Strong Generalization Ability Based on Improved Proximal Policy Optimization

Author: Pengqi Sun, Chunxi Yang, Xiaojie Zhou, and Wenbo Wang
Subjects: USV, path planning, deep reinforcement learning, deep neural network, generalization, perception, Chemical technology, TP1-1185
Abstract: To solve the problems of path planning and dynamic obstacle avoidance for an unmanned surface vehicle (USV) in a locally observable non-dynamic ocean environment, a visual perception and decision-making method based on deep reinforcement learning is proposed. This method replaces the full connection layer in the Proximal Policy Optimization (PPO) neural network structure with a convolutional neural network (CNN). In this way, the degree of memorization and forgetting of sample information is controlled. Moreover, this method accumulates reward models faster by preferentially learning samples with high reward values. From the USV-centered radar perception input of the local environment, the output of the action is realized through an end-to-end learning model, and the environment perception and decision are formed as a closed loop. Thus, the proposed algorithm has good adaptability in different marine environments. The simulation results show that, compared with the PPO algorithm, Soft Actor–Critic (SAC) algorithm, and Deep Q Network (DQN) algorithm, the proposed algorithm can accelerate the model convergence speed and improve the path planning performances in partly or fully unknown ocean fields.
Published: 2023
Full Text: View/download PDF

45. Intelligent Hierarchical Admission Control for Low-Earth Orbit Satellites Based on Deep Reinforcement Learning

Author: Debin Wei, Chuanqi Guo, and Li Yang
Subjects: low-earth orbit, satellite communication, deep reinforcement learning, resource allocation, admission control, Chemical technology, TP1-1185
Abstract: Low-Earth orbit (LEO) satellites have limited on-board resources, user terminals are unevenly distributed in the constantly changing coverage area, and the service requirements vary significantly. It is urgent to optimize resource allocation under the constraint of limited satellite spectrum resources and ensure the fairness of service admission control. Therefore, we propose an intelligent hierarchical admission control (IHAC) strategy based on deep reinforcement learning (DRL). This strategy combines the deep deterministic policy gradient (DDPG) and the deep Q network (DQN) intelligent algorithm to construct upper and lower hierarchical resource allocation and admission control frameworks. The upper controller considers the state features of each ground zone and satellite resources from a global perspective, and determines the beam resource allocation ratio of each ground zone. The lower controller formulates the admission control policy based on the decision of the upper controller and the detailed information of the users’ services. At the same time, a designed reward and punishment mechanism is used to optimize the decisions of the upper and lower controllers. The fairness of users’ services admissions in each ground zone is achieved as far as possible while ensuring the reasonable allocation of beam resources among zones. Finally, online decision-making and offline learning were combined, so that the controller could make full use of a large number of historical data to learn and generate intelligent strategies with stronger adaptive ability while interacting with the network environment in real time. A large number of simulation results show that IHAC has better performance in terms of a successful service admission rate, service drop rate, and fair resource allocation. Among them, the number of accepted services increased by 20.36% on average, the packet loss rate decreased by 17.56% on average, and the resource fairness increased by 17.16% on average.
Published: 2023
Full Text: View/download PDF

46. Research on Deep Reinforcement Learning Control Algorithm for Active Suspension Considering Uncertain Time Delay

Author: Yang Wang, Cheng Wang, Shijie Zhao, and Konghui Guo
Subjects: active suspension, deep reinforcement learning, suspension control, uncertain time delay, Chemical technology, TP1-1185
Abstract: The uncertain delay characteristic of actuators is a critical factor that affects the control effectiveness of the active suspension system. Therefore, it is crucial to develop a control algorithm that takes into account this uncertain delay in order to ensure stable control performance. This study presents a novel active suspension control algorithm based on deep reinforcement learning (DRL) that specifically addresses the issue of uncertain delay. In this approach, a twin-delayed deep deterministic policy gradient (TD3) algorithm with system delay is employed to obtain the optimal control policy by iteratively solving the dynamic model of the active suspension system, considering the delay. Furthermore, three different operating conditions were designed for simulation to evaluate the control performance: deterministic delay, semi-regular delay, and uncertain delay. The experimental results demonstrate that the proposed algorithm achieves excellent control performance under various operating conditions. Compared to passive suspension, the optimization of body vertical acceleration is improved by more than 30%, and the proposed algorithm effectively mitigates body vibration in the low frequency range. It consistently maintains a more than 30% improvement in ride comfort optimization even under the most severe operating conditions and at different speeds, demonstrating the algorithm’s potential for practical application.
Published: 2023
Full Text: View/download PDF

47. Perimeter Control Method of Road Traffic Regions Based on MFD-DDPG

Author: Guorong Zheng, Yuke Liu, Yazhou Fu, Yingjie Zhao, and Zundong Zhang
Subjects: perimeter control, macroscopic fundamental diagram, deep reinforcement learning, Chemical technology, TP1-1185
Abstract: As urban areas continue to expand, traffic congestion has emerged as a significant challenge impacting urban governance and economic development. Frequent regional traffic congestion has become a primary factor hindering urban economic growth and social activities, necessitating improved regional traffic management. Addressing regional traffic optimization and control methods based on the characteristics of regional congestion has become a crucial and complex issue in the field of traffic management and control research. This paper focuses on the macroscopic fundamental diagram (MFD) and aims to tackle the control problem without relying on traffic determination information. To address this, we introduce the Q-learning (QL) algorithm in reinforcement learning and the Deep Deterministic Policy Gradient (DDPG) algorithm in deep reinforcement learning. Subsequently, we propose the MFD-QL perimeter control model and the MFD-DDPG perimeter control model. We conduct numerical analysis and simulation experiments to verify the effectiveness of the MFD-QL and MFD-DDPG algorithms. The experimental results show that the algorithms converge rapidly to a stable state and achieve superior control effects in optimizing regional perimeter control.
Published: 2023
Full Text: View/download PDF

48. Task Offloading Decision-Making Algorithm for Vehicular Edge Computing: A Deep-Reinforcement-Learning-Based Approach

Author: Wei Shi, Long Chen, and Xia Zhu
Subjects: computation offloading, vehicular edge computing, deep reinforcement learning, Chemical technology, TP1-1185
Abstract: Efficient task offloading decision is a crucial technology in vehicular edge computing, which aims to fulfill the computational performance demands of complex vehicular tasks with respect to delay and energy consumption while minimizing network resource competition and consumption. Conventional distributed task offloading decisions rely solely on the local state of the vehicle, failing to optimize the utilization of the server’s resources to its fullest potential. In addition, the mobility aspect of vehicles is often neglected in these decisions. In this paper, a cloud-edge-vehicle three-tier vehicular edge computing (VEC) system is proposed, where vehicles partially offload their computing tasks to edge or cloud servers while keeping the remaining tasks local to the vehicle terminals. Under the restrictions of vehicle mobility and discrete variables, task scheduling and task offloading proportion are jointly optimized with the objective of minimizing the total system cost. Considering the non-convexity, high-dimensional complex state and continuous action space requirements of the optimization problem, we propose a task offloading decision-making algorithm based on deep deterministic policy gradient (TODM_DDPG). TODM_DDPG algorithm adopts the actor–critic framework in which the actor network outputs floating point numbers to represent deterministic policy, while the critic network evaluates the action output by the actor network, and adjusts the network evaluation policy according to the rewards with the environment to maximize the long-term reward. To explore the algorithm performance, this conduct parameter setting experiments to correct the algorithm core hyper-parameters and select the optimal combination of parameters. In addition, in order to verify algorithm performance, we also carry out a series of comparative experiments with baseline algorithms. The results demonstrate that in terms of reducing system costs, the proposed algorithm outperforms the compared baseline algorithm, such as the deep Q network (DQN) and the actor–critic (AC), and the performance is improved by about 13% on average.
Published: 2023
Full Text: View/download PDF

49. A Multi-Task Fusion Strategy-Based Decision-Making and Planning Method for Autonomous Driving Vehicles

Author: Weiguo Liu, Zhiyu Xiang, Han Fang, Ke Huo, and Zixu Wang
Subjects: deep reinforcement learning, decision-making planning, multi-task fusion, DDPG, simulation platform, end-to-end, Chemical technology, TP1-1185
Abstract: The autonomous driving technology based on deep reinforcement learning (DRL) has been confirmed as one of the most cutting-edge research fields worldwide. The agent is enabled to achieve the goal of making independent decisions by interacting with the environment and learning driving strategies based on the feedback from the environment. This technology has been widely used in end-to-end driving tasks. However, this field faces several challenges. First, developing real vehicles is expensive, time-consuming, and risky. To further expedite the testing, verification, and iteration of end-to-end deep reinforcement learning algorithms, a joint simulation development and validation platform was designed and implemented in this study based on VTD–CarSim and the Tensorflow deep learning framework, and research work was conducted based on this platform. Second, sparse reward signals can cause problems (e.g., a low-sample learning rate). It is imperative for the agent to be capable of navigating in an unfamiliar environment and driving safely under a wide variety of weather or lighting conditions. To address the problem of poor generalization ability of the agent to unknown scenarios, a deep deterministic policy gradient (DDPG) decision-making and planning method was proposed in this study in accordance with a multi-task fusion strategy. The main task based on DRL decision-making planning and the auxiliary task based on image semantic segmentation were cross-fused, and part of the network was shared with the main task to reduce the possibility of model overfitting and improve the generalization ability. As indicated by the experimental results, first, the joint simulation development and validation platform built in this study exhibited prominent versatility. Users were enabled to easily substitute any default module with customized algorithms and verify the effectiveness of new functions in enhancing overall performance using other default modules of the platform. Second, the deep reinforcement learning strategy based on multi-task fusion proposed in this study was competitive. Its performance was better than other DRL algorithms in certain tasks, which improved the generalization ability of the vehicle decision-making planning algorithm.
Published: 2023
Full Text: View/download PDF

50. Optimal Reactive Power Dispatch in ADNs using DRL and the Impact of Its Various Settings and Environmental Changes

Author: Tassneem Zamzam, Khaled Shaban, and Ahmed Massoud
Subjects: active distribution network, optimal reactive power dispatch, deep reinforcement learning, reward functions, hyperparameters, neural network, Chemical technology, TP1-1185
Abstract: Modern active distribution networks (ADNs) witness increasing complexities that require efforts in control practices, including optimal reactive power dispatch (ORPD). Deep reinforcement learning (DRL) is proposed to manage the network’s reactive power by coordinating different resources, including distributed energy resources, to enhance performance. However, there is a lack of studies examining DRL elements’ performance sensitivity. To this end, in this paper we examine the impact of various DRL reward representations and hyperparameters on the agent’s learning performance when solving the ORPD problem for ADNs. We assess the agent’s performance regarding accuracy and training time metrics, as well as critic estimate measures. Furthermore, different environmental changes are examined to study the DRL model’s scalability by including other resources. Results show that compared to other representations, the complementary reward function exhibits improved performance in terms of power loss minimization and convergence time by 10–15% and 14–18%, respectively. Also, adequate agent performance is observed to be neighboring the best-suited value of each hyperparameter for the studied problem. In addition, scalability analysis depicts that increasing the number of possible action combinations in the action space by approximately nine times results in 1.7 times increase in the training time.
Published: 2023
Full Text: View/download PDF

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Database

Publisher

227 results on '"Deep Reinforcement Learning"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources