Hiql: Offline goal-conditioned rl with latent states as actions

S Park, D Ghosh, B Eysenbach… - Advances in Neural …, 2024 - proceedings.neurips.cc
Unsupervised pre-training has recently become the bedrock for computer vision and natural
language processing. In reinforcement learning (RL), goal-conditioned RL can potentially …

Elastic decision transformer

YH Wu, X Wang, M Hamaya - Advances in Neural …, 2024 - proceedings.neurips.cc
Abstract This paper introduces Elastic Decision Transformer (EDT), a significant
advancement over the existing Decision Transformer (DT) and its variants. Although DT …

When to trust your simulator: Dynamics-aware hybrid offline-and-online reinforcement learning

H Niu, Y Qiu, M Li, G Zhou, J Hu… - Advances in Neural …, 2022 - proceedings.neurips.cc
Learning effective reinforcement learning (RL) policies to solve real-world complex tasks
can be quite challenging without a high-fidelity simulation environment. In most cases, we …

Offline multi-agent reinforcement learning with implicit global-to-local value regularization

X Wang, H Xu, Y Zheng, X Zhan - Advances in Neural …, 2024 - proceedings.neurips.cc
Offline reinforcement learning (RL) has received considerable attention in recent years due
to its attractive capability of learning policies from offline datasets without environmental …

Proto: Iterative policy regularized offline-to-online reinforcement learning

J Li, X Hu, H Xu, J Liu, X Zhan, YQ Zhang - arXiv preprint arXiv …, 2023 - arxiv.org
Offline-to-online reinforcement learning (RL), by combining the benefits of offline pretraining
and online finetuning, promises enhanced sample efficiency and policy performance …

Safe offline reinforcement learning with feasibility-guided diffusion model

Y Zheng, J Li, D Yu, Y Yang, SE Li, X Zhan… - arXiv preprint arXiv …, 2024 - arxiv.org
Safe offline RL is a promising way to bypass risky online interactions towards safe policy
learning. Most existing methods only enforce soft constraints, ie, constraining safety …

Plex: Making the most of the available data for robotic manipulation pretraining

G Thomas, CA Cheng, R Loynd… - … on Robot Learning, 2023 - proceedings.mlr.press
A rich representation is key to general robotic manipulation, but existing approaches to
representation learning require large amounts of multimodal demonstrations. In this work we …

Mind the gap: Offline policy optimization for imperfect rewards

J Li, X Hu, H Xu, J Liu, X Zhan, QS Jia… - arXiv preprint arXiv …, 2023 - arxiv.org
Reward function is essential in reinforcement learning (RL), serving as the guiding signal to
incentivize agents to solve given tasks, however, is also notoriously difficult to design. In …

Look beneath the surface: Exploiting fundamental symmetry for sample-efficient offline rl

P Cheng, X Zhan, W Zhang, Y Lin… - Advances in Neural …, 2024 - proceedings.neurips.cc
Offline reinforcement learning (RL) offers an appealing approach to real-world tasks by
learning policies from pre-collected datasets without interacting with the environment …

Diffusion-dice: In-sample diffusion guidance for offline reinforcement learning

L Mao, H Xu, X Zhan, W Zhang, A Zhang - arXiv preprint arXiv:2407.20109, 2024 - arxiv.org
One important property of DIstribution Correction Estimation (DICE) methods is that the
solution is the optimal stationary distribution ratio between the optimized and data collection …