Start Over

Efficient Reinforcement Learning With the Novel N-Step Method and V-Network

Authors :: Zhang, Miaomiao
Zhang, Shuo
Wu, Xinying
Shi, Zhiyi
Deng, Xiangyang
Wu, Edmond Q.
Xu, Xin
Source :: IEEE Transactions on Cybernetics; October 2024, Vol. 54 Issue: 10 p6048-6057, 10p
Publication Year :: 2024
Abstract: The application of reinforcement learning (RL) in artificial intelligence has become increasingly widespread. However, its drawbacks are also apparent, as it requires a large number of samples for support, making the enhancement of sample efficiency a research focus. To address this issue, we propose a novel N-step method. This method extends the horizon of the agent, enabling it to acquire more long-term effective information, thus resolving the issue of data inefficiency in RL. Additionally, this N-step method can reduce the estimation variance of Q-function, which is one of the factors contributing to estimation errors in Q-function estimation. Apart from high variance, estimation bias in Q-function estimation is another factor leading to estimation errors. To mitigate the estimation bias of Q-function, we design a regularization method based on the V-function, which has been underexplored. The combination of these two methods perfectly addresses the problems of low sample efficiency and inaccurate Q-function estimation in RL. Finally, extensive experiments conducted in discrete and continuous action spaces demonstrate that the proposed novel N-step method, when combined with classical deep Q-network, deep deterministic policy gradient, and TD3 algorithms, is effective, consistently outperforming the classical algorithms.