Parrot: Data-driven behavioral priors for reinforcement learning

S Gu, L Yang, Y Du, G Chen, F Walter… - … on Pattern Analysis …, 2024 - ieeexplore.ieee.org

Reinforcement Learning (RL) has achieved tremendous success in many complex decision-
making tasks. However, safety concerns are raised during deploying RL in real-world …

被引用次数：283 相关文章所有 2 个版本

[PDF] neurips.cc

Decision transformer: Reinforcement learning via sequence modeling

L Chen, K Lu, A Rajeswaran, K Lee… - Advances in neural …, 2021 - proceedings.neurips.cc

We introduce a framework that abstracts Reinforcement Learning (RL) as a sequence
modeling problem. This allows us to draw upon the simplicity and scalability of the …

被引用次数：1698 相关文章所有 11 个版本

[PDF] neurips.cc

Behavior Transformers: Cloning modes with one stone

NM Shafiullah, Z Cui… - Advances in neural …, 2022 - proceedings.neurips.cc

While behavior learning has made impressive progress in recent times, it lags behind
computer vision and natural language processing due to its inability to leverage large …

被引用次数：179 相关文章所有 6 个版本

[PDF] arxiv.org

Foundation models for decision making: Problems, methods, and opportunities

S Yang, O Nachum, Y Du, J Wei, P Abbeel… - arXiv preprint arXiv …, 2023 - arxiv.org

Foundation models pretrained on diverse data at scale have demonstrated extraordinary
capabilities in a wide range of vision and language tasks. When such models are deployed …

被引用次数：134 相关文章所有 3 个版本

[PDF] arxiv.org

Imitating human behaviour with diffusion models

T Pearce, T Rashid, A Kanervisto, D Bignell… - arXiv preprint arXiv …, 2023 - arxiv.org

Diffusion models have emerged as powerful generative models in the text-to-image domain.
This paper studies their application as observation-to-action models for imitating human …

被引用次数：171 相关文章所有 6 个版本

[PDF] mlr.press

Offline-to-online reinforcement learning via balanced replay and pessimistic q-ensemble

S Lee, Y Seo, K Lee, P Abbeel… - Conference on Robot …, 2022 - proceedings.mlr.press

Recent advance in deep offline reinforcement learning (RL) has made it possible to train
strong robotic agents from offline datasets. However, depending on the quality of the trained …

被引用次数：192 相关文章所有 5 个版本

[PDF] arxiv.org

Goal-conditioned imitation learning using score-based diffusion policies

M Reuss, M Li, X Jia, R Lioutikov - arXiv preprint arXiv:2304.02532, 2023 - arxiv.org

We propose a new policy representation based on score-based diffusion models (SDMs).
We apply our new policy representation in the domain of Goal-Conditioned Imitation …

被引用次数：114 相关文章所有 6 个版本

[PDF] arxiv.org

Can wikipedia help offline reinforcement learning?

M Reid, Y Yamada, SS Gu - arXiv preprint arXiv:2201.12122, 2022 - arxiv.org

Fine-tuning reinforcement learning (RL) models has been challenging because of a lack of
large scale off-the-shelf datasets as well as high variance in transferability among different …

被引用次数：97 相关文章所有 4 个版本

[PDF] arxiv.org

Offline reinforcement learning via high-fidelity generative behavior modeling

H Chen, C Lu, C Ying, H Su, J Zhu - arXiv preprint arXiv:2209.14548, 2022 - arxiv.org

In offline reinforcement learning, weighted regression is a common method to ensure the
learned policy stays close to the behavior policy and to prevent selecting out-of-sample …

被引用次数：88 相关文章所有 3 个版本

[PDF] mlr.press

Representation matters: Offline pretraining for sequential decision making

M Yang, O Nachum - International Conference on Machine …, 2021 - proceedings.mlr.press

The recent success of supervised learning methods on ever larger offline datasets has
spurred interest in the reinforcement learning (RL) field to investigate whether the same …

被引用次数：138 相关文章所有 5 个版本

高级搜索

QQ 群