Back to Search Start Over

基于值函数估计的参数探索策略梯度算法.

Authors :
赵婷婷
杨梦楠
陈亚瑞
王嫄
杨巨成
Source :
Application Research of Computers / Jisuanji Yingyong Yanjiu. Aug2023, Vol. 40 Issue 8, p2405-2410. 6p.
Publication Year :
2023

Abstract

Policy gradient algorithms suffer from the large variance of gradient estimation. the algorithm of policy gradient with parameter based exploration mitigates this problem to some extent. However, PGPE estimates its gradient based on the Monte Carlo, which requires a large number of samples to achieve the fairly stable policy update. And thus hinders its application in the real world problem. In order to further reduce the variance of policy gradient, the algorithm of function approximation for policy gradients with parameter-based exploration (PGPE-FA) implements the algorithm of PGPE in the Actor-Critic framework. More specifically, the proposed method utilized value function to estimate the policy gradient, instead of using trajectory samples to estimate the policy gradient as PGPE method does, thereby reducing the variance of gradient estimation. Finally, the experiment verifies that the proposed algorithm can reduce the variance of gradient estimation. [ABSTRACT FROM AUTHOR]

Details

Language :
Chinese
ISSN :
10013695
Volume :
40
Issue :
8
Database :
Academic Search Index
Journal :
Application Research of Computers / Jisuanji Yingyong Yanjiu
Publication Type :
Academic Journal
Accession number :
169933062
Full Text :
https://doi.org/10.19734/j.issn.1001-3695.2022.11.0781