Dataset Clustering for Improved Offline Policy Learning

Q Wang, Y Deng, FR Sanchez, K Wang… - arXiv preprint arXiv …, 2024 - arxiv.org
Offline policy learning aims to discover decision-making policies from previously-collected
datasets without additional online interactions with the environment. As the training dataset …

Reward-free offline reinforcement learning: Optimizing behavior policy via action exploration

Z Huang, S Sun, J Zhao - Knowledge-Based Systems, 2024 - Elsevier
Offline reinforcement learning (RL) aims to learn a policy from pre-collected data, avoiding
costly or risky interactions with the environment. In the offline setting, the inherent problem of …

Model-based offline policy optimization with distribution correcting regularization

J Shen, M Chen, Z Zhang, Z Yang, W Zhang… - Machine Learning and …, 2021 - Springer
Abstract Offline Reinforcement Learning (RL) aims at learning effective policies by
leveraging previously collected datasets without further exploration in environments. Model …

Conservative bayesian model-based value expansion for offline policy optimization

J Jeong, X Wang, M Gimelfarb, H Kim… - arXiv preprint arXiv …, 2022 - arxiv.org
Offline reinforcement learning (RL) addresses the problem of learning a performant policy
from a fixed batch of data collected by following some behavior policy. Model-based …

Deep generative models for offline policy learning: Tutorial, survey, and perspectives on future directions

J Chen, B Ganguly, Y Xu, Y Mei, T Lan… - arXiv preprint arXiv …, 2024 - arxiv.org
Deep generative models (DGMs) have demonstrated great success across various domains,
particularly in generating texts, images, and videos using models trained from offline data …

Policy regularization with dataset constraint for offline reinforcement learning

Y Ran, YC Li, F Zhang, Z Zhang… - … Conference on Machine …, 2023 - proceedings.mlr.press
We consider the problem of learning the best possible policy from a fixed dataset, known as
offline Reinforcement Learning (RL). A common taxonomy of existing offline RL works is …

Fine-tuning offline reinforcement learning with model-based policy optimization

A Villaflor, J Dolan, J Schneider - 2020 - openreview.net
In offline reinforcement learning (RL), we attempt to learn a control policy from a fixed
dataset of environment interactions. This setting has the potential benefit of allowing us to …

Off-policy policy gradient algorithms by constraining the state distribution shift

R Islam, KK Teru, D Sharma, J Pineau - arXiv preprint arXiv:1911.06970, 2019 - arxiv.org
Off-policy deep reinforcement learning (RL) algorithms are incapable of learning solely from
batch offline data without online interactions with the environment, due to the phenomenon …

Online Policy Learning from Offline Preferences

G Zhang, H Bao, H Kashima - arXiv preprint arXiv:2403.10160, 2024 - arxiv.org
In preference-based reinforcement learning (PbRL), a reward function is learned from a type
of human feedback called preference. To expedite preference collection, recent works have …

Behavioral priors and dynamics models: Improving performance and domain transfer in offline rl

C Cang, A Rajeswaran, P Abbeel, M Laskin - arXiv preprint arXiv …, 2021 - arxiv.org
Offline Reinforcement Learning (RL) aims to extract near-optimal policies from imperfect
offline data without additional environment interactions. Extracting policies from diverse …