Learning robotic navigation from experience: principles, methods and recent results

S Levine, D Shah - … Transactions of the Royal Society B, 2023 - royalsocietypublishing.org
Navigation is one of the most heavily studied problems in robotics and is conventionally
approached as a geometric mapping and planning problem. However, real-world navigation …

Goal-conditioned imitation learning using score-based diffusion policies

M Reuss, M Li, X Jia, R Lioutikov - arXiv preprint arXiv:2304.02532, 2023 - arxiv.org
We propose a new policy representation based on score-based diffusion models (SDMs).
We apply our new policy representation in the domain of Goal-Conditioned Imitation …

Hiql: Offline goal-conditioned rl with latent states as actions

S Park, D Ghosh, B Eysenbach… - Advances in Neural …, 2024 - proceedings.neurips.cc
Unsupervised pre-training has recently become the bedrock for computer vision and natural
language processing. In reinforcement learning (RL), goal-conditioned RL can potentially …

Rorl: Robust offline reinforcement learning via conservative smoothing

R Yang, C Bai, X Ma, Z Wang… - Advances in neural …, 2022 - proceedings.neurips.cc
Offline reinforcement learning (RL) provides a promising direction to exploit massive amount
of offline data for complex decision-making tasks. Due to the distribution shift issue, current …

A policy-guided imitation approach for offline reinforcement learning

H Xu, L Jiang, L Jianxiong… - Advances in Neural …, 2022 - proceedings.neurips.cc
Offline reinforcement learning (RL) methods can generally be categorized into two types: RL-
based and Imitation-based. RL-based methods could in principle enjoy out-of-distribution …

Hierarchical diffusion for offline decision making

W Li, X Wang, B Jin, H Zha - International Conference on …, 2023 - proceedings.mlr.press
Offline reinforcement learning typically introduces a hierarchical structure to solve the long-
horizon problem so as to address its thorny issue of variance accumulation. Problems of …

From play to policy: Conditional behavior generation from uncurated robot data

ZJ Cui, Y Wang, NMM Shafiullah, L Pinto - arXiv preprint arXiv:2210.10047, 2022 - arxiv.org
While large-scale sequence modeling from offline data has led to impressive performance
gains in natural language and image generation, directly translating such ideas to robotics …

Design from policies: Conservative test-time adaptation for offline policy optimization

J Liu, H Zhang, Z Zhuang, Y Kang… - Advances in Neural …, 2024 - proceedings.neurips.cc
In this work, we decouple the iterative bi-level offline RL (value estimation and policy
extraction) from the offline training phase, forming a non-iterative bi-level paradigm and …

Exploit reward shifting in value-based deep-rl: Optimistic curiosity-based exploration and conservative exploitation via linear reward shaping

H Sun, L Han, R Yang, X Ma… - Advances in neural …, 2022 - proceedings.neurips.cc
In this work, we study the simple yet universally applicable case of reward shaping in value-
based Deep Reinforcement Learning (DRL). We show that reward shifting in the form of a …

Offline Goal-Conditioned Reinforcement Learning via -Advantage Regression

JY Ma, J Yan, D Jayaraman… - Advances in neural …, 2022 - proceedings.neurips.cc
Offline goal-conditioned reinforcement learning (GCRL) promises general-purpose skill
learning in the form of reaching diverse goals from purely offline datasets. We propose …