1. 基于值函数估计的参数探索策略梯度算法.
- Author
-
赵婷婷, 杨梦楠, 陈亚瑞, 王嫄, and 杨巨成
- Subjects
- *
APPROXIMATION algorithms , *SOCIAL problems , *REINFORCEMENT learning , *ALGORITHMS - Abstract
Policy gradient algorithms suffer from the large variance of gradient estimation. the algorithm of policy gradient with parameter based exploration mitigates this problem to some extent. However, PGPE estimates its gradient based on the Monte Carlo, which requires a large number of samples to achieve the fairly stable policy update. And thus hinders its application in the real world problem. In order to further reduce the variance of policy gradient, the algorithm of function approximation for policy gradients with parameter-based exploration (PGPE-FA) implements the algorithm of PGPE in the Actor-Critic framework. More specifically, the proposed method utilized value function to estimate the policy gradient, instead of using trajectory samples to estimate the policy gradient as PGPE method does, thereby reducing the variance of gradient estimation. Finally, the experiment verifies that the proposed algorithm can reduce the variance of gradient estimation. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF