Groot: Learning to follow instructions by watching gameplay videos

S Cai, B Zhang, Z Wang, X Ma, A Liu… - arXiv preprint arXiv …, 2023 - arxiv.org
We study the problem of building a controller that can follow open-ended instructions in
open-world environments. We propose to follow reference videos as instructions, which offer …

Deep generative models for offline policy learning: Tutorial, survey, and perspectives on future directions

J Chen, B Ganguly, Y Xu, Y Mei, T Lan… - arXiv preprint arXiv …, 2024 - arxiv.org
Deep generative models (DGMs) have demonstrated great success across various domains,
particularly in generating texts, images, and videos using models trained from offline data …

Unleashing the power of pre-trained language models for offline reinforcement learning

R Shi, Y Liu, Y Ze, SS Du, H Xu - arXiv preprint arXiv:2310.20587, 2023 - arxiv.org
Offline reinforcement learning (RL) aims to find a near-optimal policy using pre-collected
datasets. In real-world scenarios, data collection could be costly and risky; therefore, offline …

Reinformer: Max-return sequence modeling for offline rl

Z Zhuang, D Peng, J Liu, Z Zhang, D Wang - arXiv preprint arXiv …, 2024 - arxiv.org
As a data-driven paradigm, offline reinforcement learning (RL) has been formulated as
sequence modeling that conditions on the hindsight information including returns, goal or …

Pre-training goal-based models for sample-efficient reinforcement learning

H Yuan, Z Mu, F Xie, Z Lu - The Twelfth International Conference on …, 2024 - openreview.net
Pre-training on task-agnostic large datasets is a promising approach for enhancing the
sample efficiency of reinforcement learning (RL) in solving complex tasks. We present …

Weighting online decision transformer with episodic memory for offline-to-online reinforcement learning

X Ma, WJ Li - 2024 IEEE International Conference on Robotics …, 2024 - ieeexplore.ieee.org
Offline reinforcement learning (RL) has been shown to be successfully modeled as a
sequence modeling problem, drawing inspiration from the success of Transformers. Offline …

Self-supervised Pretraining for Decision Foundation Model: Formulation, Pipeline and Challenges

X Liu, J Jiao, J Zhang - arXiv preprint arXiv:2401.00031, 2023 - arxiv.org
Decision-making is a dynamic process requiring perception, memory, and reasoning to
make choices and find optimal policies. Traditional approaches to decision-making suffer …

Hindsight Preference Learning for Offline Preference-based Reinforcement Learning

CX Gao, S Fang, C Xiao, Y Yu, Z Zhang - arXiv preprint arXiv:2407.04451, 2024 - arxiv.org
Offline preference-based reinforcement learning (RL), which focuses on optimizing policies
using human preferences between pairs of trajectory segments selected from an offline …

Enhancing Cross-domain Pre-Trained Decision Transformers with Adaptive Attention

W Zhao, Q Xu, L Xu, L Song, J Wang, C Zhou… - arXiv preprint arXiv …, 2024 - arxiv.org
Recently, the pre-training of decision transformers (DT) using a different domain, such as
natural language text, has generated significant attention in offline reinforcement learning …

Meta-DT: Offline Meta-RL as Conditional Sequence Modeling with World Model Disentanglement

Z Wang, L Zhang, W Wu, Y Zhu, D Zhao… - arXiv preprint arXiv …, 2024 - arxiv.org
A longstanding goal of artificial general intelligence is highly capable generalists that can
learn from diverse experiences and generalize to unseen tasks. The language and vision …