Start Over

Dyna-style Model-based reinforcement learning with Model-Free Policy Optimization.

Authors :: Dong, Kun
Luo, Yongle
Wang, Yuxin
Liu, Yu
Qu, Chengeng
Zhang, Qiang
Cheng, Erkang
Sun, Zhiyong
Song, Bo
Source :: Knowledge-Based Systems. Mar2024, Vol. 287, pN.PAG-N.PAG. 1p.
Publication Year :: 2024
Abstract: Dyna-style Model-based reinforcement learning (MBRL) methods have demonstrated superior sample efficiency compared to their model-free counterparts, largely attributable to the leverage of learned models. Despite these advancements, the effective application of these learned models remains challenging, largely due to the intricate interdependence between model learning and policy optimization, which presents a significant theoretical gap in this field. This paper bridges this gap by providing a comprehensive theoretical analysis of Dyna-style MBRL for the first time and establishing a return bound in deterministic environments. Building upon this analysis, we propose a novel schema called Model-Based Reinforcement Learning with Model-Free Policy Optimization (MBMFPO). Compared to existing MBRL methods, the proposed schema integrates model-free policy optimization into the MBRL framework, along with some additional techniques. Experimental results on various continuous control tasks demonstrate that MBMFPO can significantly enhance sample efficiency and final performance compared to baseline methods. Furthermore, extensive ablation studies provide robust evidence for the effectiveness of each individual component within the MBMFPO schema. This work advances both the theoretical analysis and practical application of Dyna-style MBRL, paving the way for more efficient reinforcement learning methods. • This paper makes an in-depth analysis of the monotonicity guarantee for the Dyna-style algorithm in the deterministic environment. • A practical schema called MBMFPO is proposed to improve policy performance in real environments. • Experimental results corroborate that the policy trained with MBMFPO schema outperforms the baseline methods in terms of sample efficiency and asymptotic performance. • Further experiments have been conducted to validate the efficacy of the individual components encompassed within the MBMFPO schema. [Display omitted] [ABSTRACT FROM AUTHOR]

Details

Language :: English
ISSN :: 09507051
Volume :: 287
Database :: Academic Search Index
Journal :: Knowledge-Based Systems
Publication Type :: Academic Journal
Accession number :: 175457116
Full Text :: https://doi.org/10.1016/j.knosys.2024.111428

Full Text Access

View/download PDF

Tools

Email
Cite

Printer

Authors Abstract Subjects Details

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Dyna-style Model-based reinforcement learning with Model-Free Policy Optimization.

Abstract

Details

Tools

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Dyna-style Model-based reinforcement learning with Model-Free Policy Optimization.

Abstract

Details

Tools

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources