Back to Search
Start Over
Reward-free exploration beyond finite-horizon
- Source :
- ICML 2020 Workshop on Theoretical Foundations of Reinforcement Learning, ICML 2020 Workshop on Theoretical Foundations of Reinforcement Learning, 2020, Vienna, France
- Publication Year :
- 2020
- Publisher :
- HAL CCSD, 2020.
-
Abstract
- International audience; We consider the reward-free exploration framework introduced by Jin et al. (2020), where an RL agent interacts with an unknown environment without any explicit reward function to maximize. The objective is to collect enough information during the exploration phase, so that a near-optimal policy can be immediately computed once any reward function is provided. In this paper, we move from the finite-horizon setting studied by Jin et al. (2020) to the more general setting of goalconditioned RL, often referred to as stochastic shortest path (SSP). We first discuss the challenges specific to SSPs and then study two scenarios: 1) reward-free goal-free exploration in communicating MDPs, and 2) reward-free goal-free incremental exploration in non-communicating MDPs where the agent is provided with a reset action to an initial state. In both cases, we provide exploration algorithms and their samplecomplexity bounds which we contrast with the existing guarantees in the finite-horizon case. 1
Details
- Language :
- English
- Database :
- OpenAIRE
- Journal :
- ICML 2020 Workshop on Theoretical Foundations of Reinforcement Learning, ICML 2020 Workshop on Theoretical Foundations of Reinforcement Learning, 2020, Vienna, France
- Accession number :
- edsair.dedup.wf.001..54f45fad43f44331c91b73830abf5830