1. Balanced prioritized experience replay in off-policy reinforcement learning.
- Author
-
Lou, Zhouwei, Wang, Yiye, Shan, Shuo, Zhang, Kanjian, and Wei, Haikun
- Subjects
REINFORCEMENT learning ,ALGORITHMS - Abstract
In Off-Policy reinforcement learning (RL), the experience imbalance problem can affect learning performance. The experience imbalance problem refers to the phenomenon that the experiences obtained by the agent during the learning process are unevenly distributed in the state space, resulting in the agent's inability to accurately estimate the value of each potential state. This problem is typically caused by environments with high-dimensional state and action spaces, as well as the exploration–exploitation mechanism inherent in RL. This article proposes a balanced prioritized experience replay (BPER) algorithm based on experience rarity. First, an evaluation metric to quantify experience rarity is defined. Then, the sampling priority of each experience is calculated according to this metric. Finally, prioritized experience replay is performed according to the sampling priority. BPER increases the sampling frequency of high-rarity experiences and decreases the sampling frequency of low-rarity experiences, enabling the agent to learn more comprehensive knowledge. We evaluate BPER on a series of MuJoCo continuous control tasks. Experimental results show that BPER can effectively improve the learning performance while mitigating the impact of the experience imbalance problem. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF