Rethinking goal-conditioned supervised learning and its connection to offline rl

S Levine, D Shah - … Transactions of the Royal Society B, 2023 - royalsocietypublishing.org

Navigation is one of the most heavily studied problems in robotics and is conventionally
approached as a geometric mapping and planning problem. However, real-world navigation …

被引用次数：22 相关文章所有 8 个版本

[PDF] arxiv.org

Goal-conditioned imitation learning using score-based diffusion policies

M Reuss, M Li, X Jia, R Lioutikov - arXiv preprint arXiv:2304.02532, 2023 - arxiv.org

We propose a new policy representation based on score-based diffusion models (SDMs).
We apply our new policy representation in the domain of Goal-Conditioned Imitation …

被引用次数：116 相关文章所有 6 个版本

[PDF] neurips.cc

Hiql: Offline goal-conditioned rl with latent states as actions

S Park, D Ghosh, B Eysenbach… - Advances in Neural …, 2024 - proceedings.neurips.cc

Unsupervised pre-training has recently become the bedrock for computer vision and natural
language processing. In reinforcement learning (RL), goal-conditioned RL can potentially …

被引用次数：35 相关文章所有 6 个版本

[PDF] neurips.cc

Rorl: Robust offline reinforcement learning via conservative smoothing

R Yang, C Bai, X Ma, Z Wang… - Advances in neural …, 2022 - proceedings.neurips.cc

Offline reinforcement learning (RL) provides a promising direction to exploit massive amount
of offline data for complex decision-making tasks. Due to the distribution shift issue, current …

被引用次数：79 相关文章所有 8 个版本

[PDF] neurips.cc

A policy-guided imitation approach for offline reinforcement learning

H Xu, L Jiang, L Jianxiong… - Advances in Neural …, 2022 - proceedings.neurips.cc

Offline reinforcement learning (RL) methods can generally be categorized into two types: RL-
based and Imitation-based. RL-based methods could in principle enjoy out-of-distribution …

被引用次数：62 相关文章所有 7 个版本

[PDF] mlr.press

Hierarchical diffusion for offline decision making

W Li, X Wang, B Jin, H Zha - International Conference on …, 2023 - proceedings.mlr.press

Offline reinforcement learning typically introduces a hierarchical structure to solve the long-
horizon problem so as to address its thorny issue of variance accumulation. Problems of …

被引用次数：26 相关文章所有 5 个版本

[PDF] arxiv.org

From play to policy: Conditional behavior generation from uncurated robot data

ZJ Cui, Y Wang, NMM Shafiullah, L Pinto - arXiv preprint arXiv:2210.10047, 2022 - arxiv.org

While large-scale sequence modeling from offline data has led to impressive performance
gains in natural language and image generation, directly translating such ideas to robotics …

被引用次数：71 相关文章所有 3 个版本

[PDF] neurips.cc

Design from policies: Conservative test-time adaptation for offline policy optimization

J Liu, H Zhang, Z Zhuang, Y Kang… - Advances in Neural …, 2024 - proceedings.neurips.cc

In this work, we decouple the iterative bi-level offline RL (value estimation and policy
extraction) from the offline training phase, forming a non-iterative bi-level paradigm and …

被引用次数：10 相关文章所有 5 个版本

[PDF] neurips.cc

Exploit reward shifting in value-based deep-rl: Optimistic curiosity-based exploration and conservative exploitation via linear reward shaping

H Sun, L Han, R Yang, X Ma… - Advances in neural …, 2022 - proceedings.neurips.cc

In this work, we study the simple yet universally applicable case of reward shaping in value-
based Deep Reinforcement Learning (DRL). We show that reward shifting in the form of a …

被引用次数：23 相关文章所有 3 个版本

[PDF] neurips.cc

Offline Goal-Conditioned Reinforcement Learning via -Advantage Regression

JY Ma, J Yan, D Jayaraman… - Advances in neural …, 2022 - proceedings.neurips.cc

Offline goal-conditioned reinforcement learning (GCRL) promises general-purpose skill
learning in the form of reaching diverse goals from purely offline datasets. We propose …

被引用次数：29 相关文章所有 6 个版本

高级搜索

QQ 群