Back to Search
Start Over
Dyna-style Model-based reinforcement learning with Model-Free Policy Optimization.
- Source :
-
Knowledge-Based Systems . Mar2024, Vol. 287, pN.PAG-N.PAG. 1p. - Publication Year :
- 2024
-
Abstract
- Dyna-style Model-based reinforcement learning (MBRL) methods have demonstrated superior sample efficiency compared to their model-free counterparts, largely attributable to the leverage of learned models. Despite these advancements, the effective application of these learned models remains challenging, largely due to the intricate interdependence between model learning and policy optimization, which presents a significant theoretical gap in this field. This paper bridges this gap by providing a comprehensive theoretical analysis of Dyna-style MBRL for the first time and establishing a return bound in deterministic environments. Building upon this analysis, we propose a novel schema called Model-Based Reinforcement Learning with Model-Free Policy Optimization (MBMFPO). Compared to existing MBRL methods, the proposed schema integrates model-free policy optimization into the MBRL framework, along with some additional techniques. Experimental results on various continuous control tasks demonstrate that MBMFPO can significantly enhance sample efficiency and final performance compared to baseline methods. Furthermore, extensive ablation studies provide robust evidence for the effectiveness of each individual component within the MBMFPO schema. This work advances both the theoretical analysis and practical application of Dyna-style MBRL, paving the way for more efficient reinforcement learning methods. • This paper makes an in-depth analysis of the monotonicity guarantee for the Dyna-style algorithm in the deterministic environment. • A practical schema called MBMFPO is proposed to improve policy performance in real environments. • Experimental results corroborate that the policy trained with MBMFPO schema outperforms the baseline methods in terms of sample efficiency and asymptotic performance. • Further experiments have been conducted to validate the efficacy of the individual components encompassed within the MBMFPO schema. [Display omitted] [ABSTRACT FROM AUTHOR]
Details
- Language :
- English
- ISSN :
- 09507051
- Volume :
- 287
- Database :
- Academic Search Index
- Journal :
- Knowledge-Based Systems
- Publication Type :
- Academic Journal
- Accession number :
- 175457116
- Full Text :
- https://doi.org/10.1016/j.knosys.2024.111428