相关文章- 学术资源搜索

Dataset Clustering for Improved Offline Policy Learning

Q Wang, Y Deng, FR Sanchez, K Wang… - arXiv preprint arXiv …, 2024 - arxiv.org

Offline policy learning aims to discover decision-making policies from previously-collected
datasets without additional online interactions with the environment. As the training dataset …

Reward-free offline reinforcement learning: Optimizing behavior policy via action exploration

Z Huang, S Sun, J Zhao - Knowledge-Based Systems, 2024 - Elsevier

Offline reinforcement learning (RL) aims to learn a policy from pre-collected data, avoiding
costly or risky interactions with the environment. In the offline setting, the inherent problem of …

[PDF] amazonaws.com

Model-based offline policy optimization with distribution correcting regularization

J Shen, M Chen, Z Zhang, Z Yang, W Zhang… - Machine Learning and …, 2021 - Springer

Abstract Offline Reinforcement Learning (RL) aims at learning effective policies by
leveraging previously collected datasets without further exploration in environments. Model …

被引用次数：5 相关文章所有 4 个版本

[PDF] arxiv.org

Conservative bayesian model-based value expansion for offline policy optimization

J Jeong, X Wang, M Gimelfarb, H Kim… - arXiv preprint arXiv …, 2022 - arxiv.org

Offline reinforcement learning (RL) addresses the problem of learning a performant policy
from a fixed batch of data collected by following some behavior policy. Model-based …

被引用次数：9 相关文章所有 5 个版本

[PDF] arxiv.org

Deep generative models for offline policy learning: Tutorial, survey, and perspectives on future directions

J Chen, B Ganguly, Y Xu, Y Mei, T Lan… - arXiv preprint arXiv …, 2024 - arxiv.org

Deep generative models (DGMs) have demonstrated great success across various domains,
particularly in generating texts, images, and videos using models trained from offline data …

被引用次数：2 相关文章所有 3 个版本

[PDF] mlr.press

Policy regularization with dataset constraint for offline reinforcement learning

Y Ran, YC Li, F Zhang, Z Zhang… - … Conference on Machine …, 2023 - proceedings.mlr.press

We consider the problem of learning the best possible policy from a fixed dataset, known as
offline Reinforcement Learning (RL). A common taxonomy of existing offline RL works is …

被引用次数：12 相关文章所有 6 个版本

[PDF] openreview.net

Fine-tuning offline reinforcement learning with model-based policy optimization

A Villaflor, J Dolan, J Schneider - 2020 - openreview.net

In offline reinforcement learning (RL), we attempt to learn a control policy from a fixed
dataset of environment interactions. This setting has the potential benefit of allowing us to …

被引用次数：9 相关文章所有 3 个版本

[PDF] arxiv.org

Off-policy policy gradient algorithms by constraining the state distribution shift

R Islam, KK Teru, D Sharma, J Pineau - arXiv preprint arXiv:1911.06970, 2019 - arxiv.org

Off-policy deep reinforcement learning (RL) algorithms are incapable of learning solely from
batch offline data without online interactions with the environment, due to the phenomenon …

被引用次数：9 相关文章所有 2 个版本

[PDF] arxiv.org

Online Policy Learning from Offline Preferences

G Zhang, H Bao, H Kashima - arXiv preprint arXiv:2403.10160, 2024 - arxiv.org

In preference-based reinforcement learning (PbRL), a reward function is learned from a type
of human feedback called preference. To expedite preference collection, recent works have …

Behavioral priors and dynamics models: Improving performance and domain transfer in offline rl

C Cang, A Rajeswaran, P Abbeel, M Laskin - arXiv preprint arXiv …, 2021 - arxiv.org

Offline Reinforcement Learning (RL) aims to extract near-optimal policies from imperfect
offline data without additional environment interactions. Extracting policies from diverse …

被引用次数：24 相关文章所有 3 个版本

高级搜索

QQ 群

Dataset Clustering for Improved Offline Policy Learning

Reward-free offline reinforcement learning: Optimizing behavior policy via action exploration

Model-based offline policy optimization with distribution correcting regularization

Conservative bayesian model-based value expansion for offline policy optimization

Deep generative models for offline policy learning: Tutorial, survey, and perspectives on future directions

Policy regularization with dataset constraint for offline reinforcement learning

Fine-tuning offline reinforcement learning with model-based policy optimization

Off-policy policy gradient algorithms by constraining the state distribution shift

Online Policy Learning from Offline Preferences

Behavioral priors and dynamics models: Improving performance and domain transfer in offline rl

相关搜索

引用