1. 基于 LSTM 与非对称网络的改进 DDPG 算法研究.
- Author
-
何富君, 王晓争, and 刘 凯
- Subjects
- *
DEEP learning , *MACHINE learning , *REINFORCEMENT learning , *ALGORITHMS , *SPEED , *CRITICS - Abstract
When the deep reinforcement learning algorithm is trained in a complex dynamic environment, it is difficult for the agent to obtain useful information due to the partial observability of the environment, which leads to typical problems such as failure to learn good strategies and slow algorithm convergence speed. This paper proposed an improved DDPG algorithm based on LSTM and asymmetric actor-critic network. This method introduced the LSTM structure into actor-critic network to learn the hidden states in partially observable Markov through memory reasoning. At the same time, when the actor network only used RGB images as partially observable inputs, the critic network used the complete state of the simulation environment to train, which formed an asymmetric network and speeded up the training convergence. The simulation experiment of manipulator grasping in ROS shows that the proposed algorithm has higher success rate and faster convergence speed compared with DDPG, PPO and LSTM-DDPG. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF