Generalized decision transformer for offline hindsight information matching

M Wen, R Lin, H Wang, Y Yang, Y Wen, L Mai… - Frontiers of Computer …, 2023 - Springer

Transformer architectures have facilitated the development of large-scale and general-
purpose sequence models for prediction tasks in natural language processing and computer …

被引用次数：15 相关文章所有 6 个版本

[PDF] arxiv.org

A generalist agent

S Reed, K Zolna, E Parisotto, SG Colmenarejo… - arXiv preprint arXiv …, 2022 - arxiv.org

Inspired by progress in large-scale language modeling, we apply a similar approach
towards building a single generalist agent beyond the realm of text outputs. The agent …

被引用次数：773 相关文章所有 4 个版本

[PDF] neurips.cc

Multi-game decision transformers

KH Lee, O Nachum, MS Yang, L Lee… - Advances in …, 2022 - proceedings.neurips.cc

A longstanding goal of the field of AI is a method for learning a highly capable, generalist
agent from diverse experience. In the subfields of vision and language, this was largely …

被引用次数：184 相关文章所有 10 个版本

[PDF] mlr.press

Q-transformer: Scalable offline reinforcement learning via autoregressive q-functions

Y Chebotar, Q Vuong, K Hausman… - … on Robot Learning, 2023 - proceedings.mlr.press

In this work, we present a scalable reinforcement learning method for training multi-task
policies from large offline datasets that can leverage both human demonstrations and …

被引用次数：44 相关文章所有 6 个版本

[PDF] mlr.press

Prompting decision transformer for few-shot policy generalization

M Xu, Y Shen, S Zhang, Y Lu, D Zhao… - international …, 2022 - proceedings.mlr.press

Human can leverage prior experience and learn novel tasks from a handful of
demonstrations. In contrast to offline meta-reinforcement learning, which aims to achieve …

被引用次数：99 相关文章所有 8 个版本

[PDF] arxiv.org

On Transforming Reinforcement Learning With Transformers: The Development Trajectory

S Hu, L Shen, Y Zhang, Y Chen… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org

Transformers, originally devised for natural language processing (NLP), have also produced
significant successes in computer vision (CV). Due to their strong expression power …

被引用次数：14 相关文章所有 5 个版本

[PDF] arxiv.org

In-context reinforcement learning with algorithm distillation

M Laskin, L Wang, J Oh, E Parisotto, S Spencer… - arXiv preprint arXiv …, 2022 - arxiv.org

We propose Algorithm Distillation (AD), a method for distilling reinforcement learning (RL)
algorithms into neural networks by modeling their training histories with a causal sequence …

被引用次数：80 相关文章所有 5 个版本

[PDF] neurips.cc

When does return-conditioned supervised learning work for offline reinforcement learning?

D Brandfonbrener, A Bietti, J Buckman… - Advances in …, 2022 - proceedings.neurips.cc

Several recent works have proposed a class of algorithms for the offline reinforcement
learning (RL) problem that we will refer to as return-conditioned supervised learning …

被引用次数：60 相关文章所有 10 个版本

[PDF] arxiv.org

Can wikipedia help offline reinforcement learning?

M Reid, Y Yamada, SS Gu - arXiv preprint arXiv:2201.12122, 2022 - arxiv.org

Fine-tuning reinforcement learning (RL) models has been challenging because of a lack of
large scale off-the-shelf datasets as well as high variance in transferability among different …

被引用次数：93 相关文章所有 4 个版本

[PDF] mlr.press

Constrained decision transformer for offline safe reinforcement learning

Z Liu, Z Guo, Y Yao, Z Cen, W Yu… - International …, 2023 - proceedings.mlr.press

Safe reinforcement learning (RL) trains a constraint satisfaction policy by interacting with the
environment. We aim to tackle a more challenging problem: learning a safe policy from an …

被引用次数：37 相关文章所有 7 个版本

高级搜索

QQ 群