Conservative q-learning for offline reinforcement learning

A Kumar, A Zhou, G Tucker… - Advances in Neural …, 2020 - proceedings.neurips.cc
Effectively leveraging large, previously collected datasets in reinforcement learn-ing (RL) is
a key challenge for large-scale real-world applications. Offline RL algorithms promise to …

Offline reinforcement learning with implicit q-learning

I Kostrikov, A Nair, S Levine - arXiv preprint arXiv:2110.06169, 2021 - arxiv.org
Offline reinforcement learning requires reconciling two conflicting aims: learning a policy that
improves over the behavior policy that collected the dataset, while at the same time …

Cal-ql: Calibrated offline rl pre-training for efficient online fine-tuning

M Nakamoto, S Zhai, A Singh… - Advances in …, 2024 - proceedings.neurips.cc
A compelling use case of offline reinforcement learning (RL) is to obtain a policy initialization
from existing datasets followed by fast online fine-tuning with limited interaction. However …

Mildly conservative q-learning for offline reinforcement learning

J Lyu, X Ma, X Li, Z Lu - Advances in Neural Information …, 2022 - proceedings.neurips.cc
Offline reinforcement learning (RL) defines the task of learning from a static logged dataset
without continually interacting with the environment. The distribution shift between the …

Uncertainty-based offline reinforcement learning with diversified q-ensemble

G An, S Moon, JH Kim… - Advances in neural …, 2021 - proceedings.neurips.cc
Offline reinforcement learning (offline RL), which aims to find an optimal policy from a
previously collected static dataset, bears algorithmic difficulties due to function …

Q-learning decision transformer: Leveraging dynamic programming for conditional sequence modelling in offline rl

T Yamagata, A Khalil… - … on Machine Learning, 2023 - proceedings.mlr.press
Recent works have shown that tackling offline reinforcement learning (RL) with a conditional
policy produces promising results. The Decision Transformer (DT) combines the conditional …

Online and offline reinforcement learning by planning with a learned model

J Schrittwieser, T Hubert, A Mandhane… - Advances in …, 2021 - proceedings.neurips.cc
Learning efficiently from small amounts of data has long been the focus of model-based
reinforcement learning, both for the online case when interacting with the environment, and …

Confidence-conditioned value functions for offline reinforcement learning

J Hong, A Kumar, S Levine - arXiv preprint arXiv:2212.04607, 2022 - arxiv.org
Offline reinforcement learning (RL) promises the ability to learn effective policies solely
using existing, static datasets, without any costly online interaction. To do so, offline RL …

Emaq: Expected-max q-learning operator for simple yet effective offline and online rl

SKS Ghasemipour, D Schuurmans… - … on Machine Learning, 2021 - proceedings.mlr.press
Off-policy reinforcement learning (RL) holds the promise of sample-efficient learning of
decision-making policies by leveraging past experience. However, in the offline RL setting …

Offline rl with no ood actions: In-sample learning via implicit value regularization

H Xu, L Jiang, J Li, Z Yang, Z Wang, VWK Chan… - arXiv preprint arXiv …, 2023 - arxiv.org
Most offline reinforcement learning (RL) methods suffer from the trade-off between improving
the policy to surpass the behavior policy and constraining the policy to limit the deviation …