Idql: Implicit q-learning as an actor-critic method with diffusion policies

P Hansen-Estruch, I Kostrikov, M Janner… - arXiv preprint arXiv …, 2023 - arxiv.org
Effective offline RL methods require properly handling out-of-distribution actions. Implicit Q-
learning (IQL) addresses this by training a Q-function using only dataset actions through a …

Hiql: Offline goal-conditioned rl with latent states as actions

S Park, D Ghosh, B Eysenbach… - Advances in Neural …, 2024 - proceedings.neurips.cc
Unsupervised pre-training has recently become the bedrock for computer vision and natural
language processing. In reinforcement learning (RL), goal-conditioned RL can potentially …

Inverse preference learning: Preference-based rl without a reward function

J Hejna, D Sadigh - Advances in Neural Information …, 2024 - proceedings.neurips.cc
Reward functions are difficult to design and often hard to align with human intent. Preference-
based Reinforcement Learning (RL) algorithms address these problems by learning reward …

A policy-guided imitation approach for offline reinforcement learning

H Xu, L Jiang, L Jianxiong… - Advances in Neural …, 2022 - proceedings.neurips.cc
Offline reinforcement learning (RL) methods can generally be categorized into two types: RL-
based and Imitation-based. RL-based methods could in principle enjoy out-of-distribution …

Critic-guided decision transformer for offline reinforcement learning

Y Wang, C Yang, Y Wen, Y Liu, Y Qiao - Proceedings of the AAAI …, 2024 - ojs.aaai.org
Recent advancements in offline reinforcement learning (RL) have underscored the
capabilities of Return-Conditioned Supervised Learning (RCSL), a paradigm that learns the …

Offline multi-agent reinforcement learning with implicit global-to-local value regularization

X Wang, H Xu, Y Zheng, X Zhan - Advances in Neural …, 2024 - proceedings.neurips.cc
Offline reinforcement learning (RL) has received considerable attention in recent years due
to its attractive capability of learning policies from offline datasets without environmental …

Towards robust offline reinforcement learning under diverse data corruption

R Yang, H Zhong, J Xu, A Zhang, C Zhang… - arXiv preprint arXiv …, 2023 - arxiv.org
Offline reinforcement learning (RL) presents a promising approach for learning reinforced
policies from offline datasets without the need for costly or unsafe interactions with the …

Uni-o4: Unifying online and offline deep reinforcement learning with multi-step on-policy optimization

K Lei, Z He, C Lu, K Hu, Y Gao, H Xu - arXiv preprint arXiv:2311.03351, 2023 - arxiv.org
Combining offline and online reinforcement learning (RL) is crucial for efficient and safe
learning. However, previous approaches treat offline and online learning as separate …

Proto: Iterative policy regularized offline-to-online reinforcement learning

J Li, X Hu, H Xu, J Liu, X Zhan, YQ Zhang - arXiv preprint arXiv …, 2023 - arxiv.org
Offline-to-online reinforcement learning (RL), by combining the benefits of offline pretraining
and online finetuning, promises enhanced sample efficiency and policy performance …

Safe offline reinforcement learning with feasibility-guided diffusion model

Y Zheng, J Li, D Yu, Y Yang, SE Li, X Zhan… - arXiv preprint arXiv …, 2024 - arxiv.org
Safe offline RL is a promising way to bypass risky online interactions towards safe policy
learning. Most existing methods only enforce soft constraints, ie, constraining safety …