Back to Search Start Over

STACoRe: Spatio-temporal and action-based contrastive representations for reinforcement learning in Atari.

Authors :
Lee, Young Jae
Kim, Jaehoon
Kwak, Mingu
Park, Young Joon
Kim, Seoung Bum
Source :
Neural Networks. Mar2023, Vol. 160, p1-11. 11p.
Publication Year :
2023

Abstract

With the development of deep learning technology, deep reinforcement learning (DRL) has successfully built intelligent agents in sequential decision-making problems through interaction with image-based environments. However, learning from unlimited interaction is impractical and sample inefficient because training an agent requires many trial and error and numerous samples. One response to this problem is sample-efficient DRL, a research area that encourages learning effective state representations in limited interactions with image-based environments. Previous methods could effectively surpass human performance by training an RL agent using self-supervised learning and data augmentation to learn good state representations from a given interaction. However, most of the existing methods only consider similarity of image observations so that they are hard to capture semantic representations. To address these challenges, we propose spatio-temporal and action-based contrastive representation (STACoRe) learning for sample-efficient DRL. STACoRe performs two contrastive learning to learn proper state representations. One uses the agent's actions as pseudo labels, and the other uses spatio-temporal information. In particular, when performing the action-based contrastive learning, we propose a method that automatically selects data augmentation techniques suitable for each environment for stable model training. We train the model by simultaneously optimizing an action-based contrastive loss function and spatio-temporal contrastive loss functions in an end-to-end manner. This leads to improving sample efficiency for DRL. We use 26 benchmark games in Atari 2600 whose environment interaction is limited to only 100k steps. The experimental results confirm that our method is more sample efficient than existing methods. The code is available at https://github.com/dudwojae/STACoRe. • Contrastive learning using the agent's actions as pseudo labels improves sample efficiency. • Automatically selecting the optimal data augmentation improves training stability. • Contrastive learning using spatio-temporal information further improves sample efficiency. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
08936080
Volume :
160
Database :
Academic Search Index
Journal :
Neural Networks
Publication Type :
Academic Journal
Accession number :
162131554
Full Text :
https://doi.org/10.1016/j.neunet.2022.12.018