Minimax value interval for off-policy evaluation and policy optimization

Y Jin, Z Yang, Z Wang - International Conference on …, 2021 - proceedings.mlr.press

We study offline reinforcement learning (RL), which aims to learn an optimal policy based on
a dataset collected a priori. Due to the lack of further interactions with the environment …

被引用次数：392 相关文章所有 7 个版本

[PDF] mlr.press

Offline reinforcement learning with realizability and single-policy concentrability

W Zhan, B Huang, A Huang… - … on Learning Theory, 2022 - proceedings.mlr.press

Sample-efficiency guarantees for offline reinforcement learning (RL) often rely on strong
assumptions on both the function classes (eg, Bellman-completeness) and the data …

被引用次数：114 相关文章所有 6 个版本

[PDF] neurips.cc

Provable benefits of actor-critic methods for offline reinforcement learning

A Zanette, MJ Wainwright… - Advances in neural …, 2021 - proceedings.neurips.cc

Actor-critic methods are widely used in offline reinforcement learningpractice, but are not so
well-understood theoretically. We propose a newoffline actor-critic algorithm that naturally …

被引用次数：124 相关文章所有 8 个版本

[PDF] arxiv.org

Settling the sample complexity of model-based offline reinforcement learning

G Li, L Shi, Y Chen, Y Chi, Y Wei - The Annals of Statistics, 2024 - projecteuclid.org

Settling the sample complexity of model-based offline reinforcement learning Page 1 The
Annals of Statistics 2024, Vol. 52, No. 1, 233–260 https://doi.org/10.1214/23-AOS2342 © …

被引用次数：77 相关文章所有 8 个版本

[PDF] mlr.press

Leveraging offline data in online reinforcement learning

A Wagenmaker, A Pacchiano - International Conference on …, 2023 - proceedings.mlr.press

Two central paradigms have emerged in the reinforcement learning (RL) community: online
RL and offline RL. In the online RL setting, the agent has no prior knowledge of the …

被引用次数：40 相关文章所有 6 个版本

[PDF] arxiv.org

Hybrid rl: Using both offline and online data can make rl efficient

Y Song, Y Zhou, A Sekhari, JA Bagnell… - arXiv preprint arXiv …, 2022 - arxiv.org

We consider a hybrid reinforcement learning setting (Hybrid RL), in which an agent has
access to an offline dataset and the ability to collect experience via real-world online …

被引用次数：62 相关文章所有 5 个版本

[PDF] arxiv.org

The role of coverage in online reinforcement learning

T Xie, DJ Foster, Y Bai, N Jiang, SM Kakade - arXiv preprint arXiv …, 2022 - arxiv.org

Coverage conditions--which assert that the data logging distribution adequately covers the
state space--play a fundamental role in determining the sample complexity of offline …

被引用次数：57 相关文章所有 4 个版本

[PDF] mlr.press

Minimax weight and q-function learning for off-policy evaluation

M Uehara, J Huang, N Jiang - International Conference on …, 2020 - proceedings.mlr.press

We provide theoretical investigations into off-policy evaluation in reinforcement learning
using function approximators for (marginalized) importance weights and value functions. Our …

被引用次数：183 相关文章所有 6 个版本

[PDF] mlr.press

Batch value-function approximation with only realizability

T Xie, N Jiang - International Conference on Machine …, 2021 - proceedings.mlr.press

We make progress in a long-standing problem of batch reinforcement learning (RL):
learning Q* from an exploratory and polynomial-sized dataset, using a realizable and …

被引用次数：115 相关文章所有 6 个版本

[PDF] openreview.net

The importance of pessimism in fixed-dataset policy optimization

J Buckman, C Gelada, MG Bellemare - arXiv preprint arXiv:2009.06799, 2020 - arxiv.org

We study worst-case guarantees on the expected return of fixed-dataset policy optimization
algorithms. Our core contribution is a unified conceptual and mathematical framework for the …

被引用次数：153 相关文章所有 5 个版本

高级搜索

QQ 群