Start Over

Offline prompt reinforcement learning method based on feature extraction

Authors :: Tianlei Yao
Xiliang Chen
Yi Yao
Weiye Huang
Zhaoyang Chen
Source :: PeerJ Computer Science, Vol 11, p e2490 (2025)
Publication Year :: 2025
Publisher :: PeerJ Inc., 2025.
Abstract: Recent studies have shown that combining Transformer and conditional strategies to deal with offline reinforcement learning can bring better results. However, in a conventional reinforcement learning scenario, the agent can receive a single frame of observations one by one according to its natural chronological sequence, but in Transformer, a series of observations are received at each step. Individual features cannot be extracted efficiently to make more accurate decisions, and it is still difficult to generalize effectively for data outside the distribution. We focus on the characteristic of few-shot learning in pre-trained models, and combine prompt learning to enhance the ability of real-time policy adjustment. By sampling the specific information in the offline dataset as trajectory samples, the task information is encoded to help the pre-trained model quickly understand the task characteristics and the sequence generation paradigm to quickly adapt to the downstream tasks. In order to understand the dependencies in the sequence more accurately, we also divide the fixed-size state information blocks in the input trajectory, extract the features of the segmented sub-blocks respectively, and finally encode the whole sequence into the GPT model to generate decisions more accurately. Experiments show that the proposed method achieves better performance than the baseline method in related tasks, can be generalized to new environments and tasks better, and effectively improves the stability and accuracy of agent decision making.