Back to Search Start Over

Traffic navigation via reinforcement learning with episodic-guided prioritized experience replay.

Authors :
Hassani, Hossein
Nikan, Soodeh
Shami, Abdallah
Source :
Engineering Applications of Artificial Intelligence. Nov2024:Part A, Vol. 137, pN.PAG-N.PAG. 1p.
Publication Year :
2024

Abstract

Deep Reinforcement Learning (DRL) models play a fundamental role in autonomous driving applications; however, they typically suffer from sample inefficiency because they often require many interactions with the environment to learn effective policies. This makes the training process time-consuming. To address this shortcoming, Prioritized Experience Replay (PER) has proven to be effective by prioritizing samples with high Temporal-Difference (TD) error for learning. In this context, this study contributes to artificial intelligence by proposing a sample-efficient DRL algorithm called Episodic-Guided Prioritized Experience Replay (EPER). The core innovation of EPER lies in the utilization of an episodic memory, dedicated to storing successful training episodes. Within this memory, expected returns for each state–action pair are extracted. These returns, combined with TD error-based prioritization, form a novel objective function for deep Q-network training. To prevent excessive determinism, EPER introduces exploration into the learning process by incorporating a regularization term into the objective function that allows exploration of state-space regions with diverse Q-values. The proposed EPER algorithm is suitable to train a DRL agent for handling episodic tasks, and it can be integrated into off-policy DRL models. EPER is employed for traffic navigation through scenarios such as highway driving, merging, roundabout, and intersection to showcase its application in engineering. The attained results denote that, compared with the PER and an additional state-of-the-art training technique, EPER is superior in expediting the training of the agent and learning a more optimal policy that leads to lower collision rates within the constructed navigation scenarios. • Proposed Episodic-Guided Prioritized Experience Replay algorithm. • Proposed to integrate episodic information in deep Q-Network training. • Regularization of the DQN loss function for enhanced performance. • Improving exploration–exploitation trade-off management. • Enhancing vehicle autonomy with Episodic-Guided Prioritized Experience Replay. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
09521976
Volume :
137
Database :
Academic Search Index
Journal :
Engineering Applications of Artificial Intelligence
Publication Type :
Academic Journal
Accession number :
179632152
Full Text :
https://doi.org/10.1016/j.engappai.2024.109147