Large sequence models for sequential decision-making: a survey

M Wen, R Lin, H Wang, Y Yang, Y Wen, L Mai… - Frontiers of Computer …, 2023 - Springer
Transformer architectures have facilitated the development of large-scale and general-
purpose sequence models for prediction tasks in natural language processing and computer …

A generalist agent

S Reed, K Zolna, E Parisotto, SG Colmenarejo… - arXiv preprint arXiv …, 2022 - arxiv.org
Inspired by progress in large-scale language modeling, we apply a similar approach
towards building a single generalist agent beyond the realm of text outputs. The agent …

Multi-game decision transformers

KH Lee, O Nachum, MS Yang, L Lee… - Advances in …, 2022 - proceedings.neurips.cc
A longstanding goal of the field of AI is a method for learning a highly capable, generalist
agent from diverse experience. In the subfields of vision and language, this was largely …

Q-transformer: Scalable offline reinforcement learning via autoregressive q-functions

Y Chebotar, Q Vuong, K Hausman… - … on Robot Learning, 2023 - proceedings.mlr.press
In this work, we present a scalable reinforcement learning method for training multi-task
policies from large offline datasets that can leverage both human demonstrations and …

Prompting decision transformer for few-shot policy generalization

M Xu, Y Shen, S Zhang, Y Lu, D Zhao… - international …, 2022 - proceedings.mlr.press
Human can leverage prior experience and learn novel tasks from a handful of
demonstrations. In contrast to offline meta-reinforcement learning, which aims to achieve …

On Transforming Reinforcement Learning With Transformers: The Development Trajectory

S Hu, L Shen, Y Zhang, Y Chen… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Transformers, originally devised for natural language processing (NLP), have also produced
significant successes in computer vision (CV). Due to their strong expression power …

In-context reinforcement learning with algorithm distillation

M Laskin, L Wang, J Oh, E Parisotto, S Spencer… - arXiv preprint arXiv …, 2022 - arxiv.org
We propose Algorithm Distillation (AD), a method for distilling reinforcement learning (RL)
algorithms into neural networks by modeling their training histories with a causal sequence …

When does return-conditioned supervised learning work for offline reinforcement learning?

D Brandfonbrener, A Bietti, J Buckman… - Advances in …, 2022 - proceedings.neurips.cc
Several recent works have proposed a class of algorithms for the offline reinforcement
learning (RL) problem that we will refer to as return-conditioned supervised learning …

Can wikipedia help offline reinforcement learning?

M Reid, Y Yamada, SS Gu - arXiv preprint arXiv:2201.12122, 2022 - arxiv.org
Fine-tuning reinforcement learning (RL) models has been challenging because of a lack of
large scale off-the-shelf datasets as well as high variance in transferability among different …

Constrained decision transformer for offline safe reinforcement learning

Z Liu, Z Guo, Y Yao, Z Cen, W Yu… - International …, 2023 - proceedings.mlr.press
Safe reinforcement learning (RL) trains a constraint satisfaction policy by interacting with the
environment. We aim to tackle a more challenging problem: learning a safe policy from an …