Optimal conservative offline rl with general function approximation via augmented lagrangian

ZW Hong, A Kumar, S Karnik… - Advances in …, 2023 - proceedings.neurips.cc

Offline reinforcement learning (RL) enables learning a decision-making policy without
interaction with the environment. This makes it particularly beneficial in situations where …

被引用次数：14 相关文章所有 6 个版本

[PDF] neurips.cc

Double pessimism is provably efficient for distributionally robust offline reinforcement learning: Generic algorithm and robust partial coverage

J Blanchet, M Lu, T Zhang… - Advances in Neural …, 2024 - proceedings.neurips.cc

We study distributionally robust offline reinforcement learning (RL), which seeks to find an
optimal robust policy purely from an offline dataset that can perform well in perturbed …

被引用次数：32 相关文章所有 7 个版本

[PDF] mlr.press

Revisiting the linear-programming framework for offline rl with general function approximation

AE Ozdaglar, S Pattathil, J Zhang… - … on Machine Learning, 2023 - proceedings.mlr.press

Offline reinforcement learning (RL) aims to find an optimal policy for sequential decision-
making using a pre-collected dataset, without further interaction with the environment …

被引用次数：27 相关文章所有 6 个版本

[PDF] neurips.cc

Policy finetuning in reinforcement learning via design of experiments using offline data

R Zhang, A Zanette - Advances in Neural Information …, 2024 - proceedings.neurips.cc

In some applications of reinforcement learning, a dataset of pre-collected experience is
already availablebut it is also possible to acquire some additional online data to help …

被引用次数：6 相关文章所有 6 个版本

[PDF] mlr.press

When is realizability sufficient for off-policy reinforcement learning?

A Zanette - International Conference on Machine Learning, 2023 - proceedings.mlr.press

Understanding when reinforcement learning algorithms can make successful off-policy
predictions—and when the may fail to do so–remains an open problem. Typically, model …

被引用次数：19 相关文章所有 7 个版本

[PDF] neurips.cc

Importance weighted actor-critic for optimal conservative offline reinforcement learning

H Zhu, P Rashidinejad, J Jiao - Advances in Neural …, 2024 - proceedings.neurips.cc

Abstract We propose A-Crab (Actor-Critic Regularized by Average Bellman error), a new
practical algorithm for offline reinforcement learning (RL) in complex environments with …

被引用次数：19 相关文章所有 7 个版本

[PDF] arxiv.org

Policy learning" without''overlap: Pessimism and generalized empirical Bernstein's inequality

Y Jin, Z Ren, Z Yang, Z Wang - arXiv preprint arXiv:2212.09900, 2022 - arxiv.org

This paper studies offline policy learning, which aims at utilizing observations collected a
priori (from either fixed or adaptively evolving behavior policies) to learn the optimal …

被引用次数：27 相关文章所有 8 个版本

[PDF] neurips.cc

On sample-efficient offline reinforcement learning: Data diversity, posterior sampling and beyond

T Nguyen-Tang, R Arora - Advances in neural information …, 2024 - proceedings.neurips.cc

We seek to understand what facilitates sample-efficient learning from historical datasets for
sequential decision-making, a problem that is popularly known as offline reinforcement …

被引用次数：5 相关文章所有 7 个版本

[PDF] mlr.press

Offline primal-dual reinforcement learning for linear mdps

G Gabbianelli, G Neu, M Papini… - … Conference on Artificial …, 2024 - proceedings.mlr.press

Abstract Offline Reinforcement Learning (RL) aims to learn a near-optimal policy from a fixed
dataset of transitions collected by another policy. This problem has attracted a lot of attention …

被引用次数：9 相关文章所有 6 个版本

[PDF] arxiv.org

Harnessing density ratios for online reinforcement learning

P Amortila, DJ Foster, N Jiang, A Sekhari… - arXiv preprint arXiv …, 2024 - arxiv.org

The theories of offline and online reinforcement learning, despite having evolved in parallel,
have begun to show signs of the possibility for a unification, with algorithms and analysis …

被引用次数：8 相关文章所有 3 个版本

高级搜索

QQ 群