101. Energy Management of Hybrid UAV Based on Reinforcement Learning
- Author
-
Jianguo Mao, Huan Shen, Zhang Yao, Linwei Wu, and Yan Zhiwei
- Subjects
Mathematical optimization ,reinforcement learning ,TK7800-8360 ,Computer Networks and Communications ,Computer science ,Energy management ,Process (computing) ,ComputerApplications_COMPUTERSINOTHERSYSTEMS ,hybrid UAV ,energy management strategy ,Rate of convergence ,Hardware and Architecture ,Control and Systems Engineering ,Robustness (computer science) ,Bellman equation ,Signal Processing ,Convergence (routing) ,Reinforcement learning ,State space ,Electrical and Electronic Engineering ,Electronics ,algorithm improvement - Abstract
In order to solve the flight time problem of Unmanned Aerial Vehicles (UAV), this paper proposes a set of energy management strategies based on reinforcement learning for hybrid agricultural UAV. The battery is used to optimize the working point of internal combustion engines to the greatest extent while solving the high power demand issues of UAV and the response problem of internal combustion engines. Firstly, the decision-making oriented hybrid model and UAV dynamic model are established. Owing to the characteristics of the energy management strategy (EMS) based on reinforcement learning (RL), which is an intelligent optimization algorithm that has emerged in recent years, the complex theoretical formula derivation is avoided in the modeling process. In terms of the EMS, a double Q learning algorithm with strong convergence is adopted. The algorithm separates the state action value function database used in derivation decisions and the state action value function-updated database brought by the decision, so as to avoid delay and shock within the convergence process caused by maximum deviation. After the improvement, the off-line training is carried out with a large number of flight data generated in the past. The simulation results demonstrate that the improved algorithm can show better performance with less learning cost than before by virtue of the search function strategy proposed in this paper. In the state space, time-based and residual fuel-based selection are carried out successively, and the convergence rate and application effect are compared and analyzed. The results show that the learning algorithm has stronger robustness and convergence speed due to the appropriate selection of state space under different types of operating cycles. After 120,000 cycles of training, the fuel economy of the improved algorithm in this paper can reach more than 90% of that of the optimal solution, and can perform stably in actual flight.
- Published
- 2021
- Full Text
- View/download PDF