Start Over

Unified curiosity-Driven learning with smoothed intrinsic reward estimation.

Authors :: Huang, Fuxian
Li, Weichao
Cui, Jiabao
Fu, Yongjian
Li, Xi
Source :: Pattern Recognition. Mar2022, Vol. 123, pN.PAG-N.PAG. 1p.
Publication Year :: 2022
Abstract: • We propose a novel distribution-aware and policy-aware unified curiosity-driven learning framework to unify state novelty and state-action novelty. DAW enables the agent to explore states diversely, and PAWencourage the agent to explore the states that the policy is uncertain about which action to take. The proposed approach improves the exploration ability of RL with complete intrinsic reward; • We propose to improve the robustness of policy learning by smoothing the intrinsic reward with a batch of transitions close to the current transition; we propose to employ an attention module to extract task-relevant features for a more precise estimation of intrinsic reward; • Extensive experiments on Atari games demonstrate the effectiveness of our approach. In reinforcement learning (RL), the intrinsic reward estimation is necessary for policy learning when the extrinsic reward is sparse or absent. To this end, Unified Curiosity-driven Learning with Smoothed intrinsic reward Estimation (UCLSE) is proposed to address the sparse extrinsic reward problem from the perspective of completeness of intrinsic reward estimation. We further propose state distribution-aware weighting method and policy-aware weighting method to dynamically unify two mainstream intrinsic reward estimation methods. In this way, the agent can explore the environment more effectively and efficiently. Under this framework, we propose to employ an attention module to extract task-relevant features for a more precise estimation of intrinsic reward. Moreover, we propose to improve the robustness of policy learning by smoothing the intrinsic reward with a batch of transitions close to the current transition. Extensive experimental results on Atari games demonstrate that our method outperforms the state-of-the-art approaches in terms of both score and training efficiency. [ABSTRACT FROM AUTHOR]