Plume Tracing via Model-Free Reinforcement Learning Method.

Authors :: Hu, Hangkai
Song, Shiji
Chen, C. L. Phillip
Source :: IEEE Transactions on Neural Networks & Learning Systems; Aug2019, Vol. 30 Issue 8, p2515-2527, 13p
Publication Year :: 2019
Abstract: This paper studies the plume-tracing strategy for an autonomous underwater vehicle (AUV) in the deep-sea turbulent environment. The tracing problem is modeled as a partially observable Markov decision process with continuous state space and action space due to the spatio-temporal changes of environment. An long short-term memory-based reinforcement learning framework with full use of history information is proposed to generate a smooth strategy while the AUV interacting with the environment. Continuous temporal difference and deterministic policy gradient methods are employed to improve the strategy. To promote the performance of the algorithm, a supervised strategy generated by dynamic programming methods is utilized as transcendental knowledge of the agent. Historical searching trajectory’s form and the exploration technology are specially designed to fit the algorithm. Simulation environments are established based on Reynolds-averaged Navier–Stokes equations and the effectiveness of the learned plume-tracing strategy is validated with simulation experiments. [ABSTRACT FROM AUTHOR]