Back to Search
Start Over
On-Policy Model Errors in Reinforcement Learning
- Publication Year :
- 2021
- Publisher :
- arXiv, 2021.
-
Abstract
- Model-free reinforcement learning algorithms can compute policy gradients given sampled environment transitions, but require large amounts of data. In contrast, model-based methods can use the learned model to generate new data, but model errors and bias can render learning unstable or suboptimal. In this paper, we present a novel method that combines real-world data and a learned model in order to get the best of both worlds. The core idea is to exploit the real-world data for on-policy predictions and use the learned model only to generalize to different actions. Specifically, we use the data as time-dependent on-policy correction terms on top of a learned model, to retain the ability to generate data without accumulating errors over long prediction horizons. We motivate this method theoretically and show that it counteracts an error term for model-based policy improvement. Experiments on MuJoCo- and PyBullet-benchmarks show that our method can drastically improve existing model-based approaches without introducing additional tuning parameters.<br />Comment: Published at The Tenth International Conference on Learning Representations (ICLR 2022)
- Subjects :
- FOS: Computer and information sciences
Computer Science - Machine Learning
Computer Science - Robotics
FOS: Electrical engineering, electronic engineering, information engineering
Systems and Control (eess.SY)
Electrical Engineering and Systems Science - Systems and Control
Robotics (cs.RO)
Machine Learning (cs.LG)
Subjects
Details
- Database :
- OpenAIRE
- Accession number :
- edsair.doi.dedup.....13f7d3b4412cd29012e0ea6ed3b3ff51
- Full Text :
- https://doi.org/10.48550/arxiv.2110.07985