Start Over

Adaptive trajectory-constrained exploration strategy for deep reinforcement learning

Authors :: Wang, Guojian
Wu, Faguo
Zhang, Xiao
Guo, Ning
Zheng, Zhiming
Source :: Knowledge-Based Systems 285 (2024) 111334
Publication Year :: 2023
Abstract: Deep reinforcement learning (DRL) faces significant challenges in addressing the hard-exploration problems in tasks with sparse or deceptive rewards and large state spaces. These challenges severely limit the practical application of DRL. Most previous exploration methods relied on complex architectures to estimate state novelty or introduced sensitive hyperparameters, resulting in instability. To mitigate these issues, we propose an efficient adaptive trajectory-constrained exploration strategy for DRL. The proposed method guides the policy of the agent away from suboptimal solutions by leveraging incomplete offline demonstrations as references. This approach gradually expands the exploration scope of the agent and strives for optimality in a constrained optimization manner. Additionally, we introduce a novel policy-gradient-based optimization algorithm that utilizes adaptively clipped trajectory-distance rewards for both single- and multi-agent reinforcement learning. We provide a theoretical analysis of our method, including a deduction of the worst-case approximation error bounds, highlighting the validity of our approach for enhancing exploration. To evaluate the effectiveness of the proposed method, we conducted experiments on two large 2D grid world mazes and several MuJoCo tasks. The extensive experimental results demonstrate the significant advantages of our method in achieving temporally extended exploration and avoiding myopic and suboptimal behaviors in both single- and multi-agent settings. Notably, the specific metrics and quantifiable results further support these findings. The code used in the study is available at \url{https://github.com/buaawgj/TACE}.<br />Comment: 35 pages, 36 figures; accepted by Knowledge-Based Systems, not published

Subjects :: Computer Science - Machine Learning

Details

Database :: arXiv
Journal :: Knowledge-Based Systems 285 (2024) 111334
Publication Type :: Report
Accession number :: edsarx.2312.16456
Document Type :: Working Paper
Full Text :: https://doi.org/10.1016/j.knosys.2023.111334

Full Text Access

View/download PDF

Tools

Email
Cite

Printer

Authors Abstract Subjects Details

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Adaptive trajectory-constrained exploration strategy for deep reinforcement learning

Abstract

Subjects

Details

Tools

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Adaptive trajectory-constrained exploration strategy for deep reinforcement learning

Abstract

Subjects

Details

Tools

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources