Is reinforcement learning more difficult than bandits? a near-optimal algorithm escaping...

DJ Foster, SM Kakade, J Qian, A Rakhlin - arXiv preprint arXiv:2112.13487, 2021 - arxiv.org

A fundamental challenge in interactive learning and decision making, ranging from bandit
problems to reinforcement learning, is to provide sample-efficient, adaptive learning …

被引用次数：204 相关文章所有 6 个版本

[PDF] mlr.press

Pessimistic q-learning for offline reinforcement learning: Towards optimal sample complexity

L Shi, G Li, Y Wei, Y Chen… - … conference on machine …, 2022 - proceedings.mlr.press

Offline or batch reinforcement learning seeks to learn a near-optimal policy using history
data without active exploration of the environment. To counter the insufficient coverage and …

被引用次数：105 相关文章所有 10 个版本

[PDF] mlr.press

Nearly minimax optimal reinforcement learning for linear mixture markov decision processes

D Zhou, Q Gu, C Szepesvari - Conference on Learning …, 2021 - proceedings.mlr.press

We study reinforcement learning (RL) with linear function approximation where the
underlying transition probability kernel of the Markov decision process (MDP) is a linear …

被引用次数：244 相关文章所有 7 个版本

[PDF] mlr.press

Nearly minimax optimal reinforcement learning for linear markov decision processes

J He, H Zhao, D Zhou, Q Gu - International Conference on …, 2023 - proceedings.mlr.press

We study reinforcement learning (RL) with linear function approximation. For episodic time-
inhomogeneous linear Markov decision processes (linear MDPs) whose transition …

被引用次数：59 相关文章所有 7 个版本

[PDF] arxiv.org

Settling the sample complexity of model-based offline reinforcement learning

G Li, L Shi, Y Chen, Y Chi, Y Wei - The Annals of Statistics, 2024 - projecteuclid.org

Settling the sample complexity of model-based offline reinforcement learning Page 1 The
Annals of Statistics 2024, Vol. 52, No. 1, 233–260 https://doi.org/10.1214/23-AOS2342 © …

被引用次数：89 相关文章所有 8 个版本

[PDF] neurips.cc

Towards instance-optimal offline reinforcement learning with pessimism

M Yin, YX Wang - Advances in neural information …, 2021 - proceedings.neurips.cc

We study the\emph {offline reinforcement learning}(offline RL) problem, where the goal is to
learn a reward-maximizing policy in an unknown\emph {Markov Decision Process}(MDP) …

被引用次数：94 相关文章所有 7 个版本

[PDF] arxiv.org

The role of coverage in online reinforcement learning

T Xie, DJ Foster, Y Bai, N Jiang, SM Kakade - arXiv preprint arXiv …, 2022 - arxiv.org

Coverage conditions--which assert that the data logging distribution adequately covers the
state space--play a fundamental role in determining the sample complexity of offline …

被引用次数：76 相关文章所有 4 个版本

[PDF] mlr.press

Reward-free rl is no harder than reward-aware rl in linear markov decision processes

AJ Wagenmaker, Y Chen… - International …, 2022 - proceedings.mlr.press

Reward-free reinforcement learning (RL) considers the setting where the agent does not
have access to a reward function during exploration, but must propose a near-optimal policy …

被引用次数：67 相关文章所有 7 个版本

[PDF] openreview.net

Should i run offline reinforcement learning or behavioral cloning?

A Kumar, J Hong, A Singh, S Levine - International Conference on …, 2021 - openreview.net

Offline reinforcement learning (RL) algorithms can acquire effective policies by utilizing only
previously collected experience, without any online interaction. While it is widely understood …

被引用次数：79 相关文章所有 2 个版本

[PDF] mlr.press

VOL: Towards Optimal Regret in Model-free RL with Nonlinear Function Approximation

A Agarwal, Y Jin, T Zhang - The Thirty Sixth Annual …, 2023 - proceedings.mlr.press

We study time-inhomogeneous episodic reinforcement learning (RL) under general function
approximation and sparse rewards. We design a new algorithm, Variance-weighted …

被引用次数：47 相关文章所有 5 个版本

高级搜索

QQ 群