Model-free reinforcement learning: from clipped pseudo-regret to sample complexity

L Shi, G Li, Y Wei, Y Chen… - … conference on machine …, 2022 - proceedings.mlr.press

Offline or batch reinforcement learning seeks to learn a near-optimal policy using history
data without active exploration of the environment. To counter the insufficient coverage and …

被引用次数：106 相关文章所有 10 个版本

[PDF] mlr.press

Nearly minimax optimal reinforcement learning for linear mixture markov decision processes

D Zhou, Q Gu, C Szepesvari - Conference on Learning …, 2021 - proceedings.mlr.press

We study reinforcement learning (RL) with linear function approximation where the
underlying transition probability kernel of the Markov decision process (MDP) is a linear …

被引用次数：244 相关文章所有 7 个版本

[PDF] arxiv.org

Settling the sample complexity of model-based offline reinforcement learning

G Li, L Shi, Y Chen, Y Chi, Y Wei - The Annals of Statistics, 2024 - projecteuclid.org

Settling the sample complexity of model-based offline reinforcement learning Page 1 The
Annals of Statistics 2024, Vol. 52, No. 1, 233–260 https://doi.org/10.1214/23-AOS2342 © …

被引用次数：89 相关文章所有 8 个版本

[PDF] mlr.press

Is reinforcement learning more difficult than bandits? a near-optimal algorithm escaping the curse of horizon

Z Zhang, X Ji, S Du - Conference on Learning Theory, 2021 - proceedings.mlr.press

Episodic reinforcement learning and contextual bandits are two widely studied sequential
decision-making problems. Episodic reinforcement learning generalizes contextual bandits …

被引用次数：125 相关文章所有 4 个版本

[PDF] neurips.cc

Computationally efficient horizon-free reinforcement learning for linear mixture mdps

D Zhou, Q Gu - Advances in neural information processing …, 2022 - proceedings.neurips.cc

Recent studies have shown that episodic reinforcement learning (RL) is not more difficult
than bandits, even with a long planning horizon and unknown state transitions. However …

被引用次数：43 相关文章所有 8 个版本

[PDF] ieee.org

The efficacy of pessimism in asynchronous Q-learning

Y Yan, G Li, Y Chen, J Fan - IEEE Transactions on Information …, 2023 - ieeexplore.ieee.org

This paper is concerned with the asynchronous form of Q-learning, which applies a
stochastic approximation scheme to Markovian data samples. Motivated by the recent …

被引用次数：60 相关文章所有 8 个版本

[PDF] neurips.cc

Breaking the sample complexity barrier to regret-optimal model-free reinforcement learning

G Li, L Shi, Y Chen, Y Gu, Y Chi - Advances in Neural …, 2021 - proceedings.neurips.cc

Achieving sample efficiency in online episodic reinforcement learning (RL) requires
optimally balancing exploration and exploitation. When it comes to a finite-horizon episodic …

被引用次数：59 相关文章所有 13 个版本

[PDF] mlr.press

Optimal multi-distribution learning

Z Zhang, W Zhan, Y Chen, SS Du… - The Thirty Seventh …, 2024 - proceedings.mlr.press

Abstract Multi-distribution learning (MDL), which seeks to learn a shared model that
minimizes the worst-case risk across $ k $ distinct data distributions, has emerged as a …

被引用次数：15 相关文章所有 4 个版本

[PDF] mlr.press

Settling the sample complexity of online reinforcement learning

Z Zhang, Y Chen, JD Lee… - The Thirty Seventh Annual …, 2024 - proceedings.mlr.press

A central issue lying at the heart of online reinforcement learning (RL) is data efficiency.
While a number of recent works achieved asymptotically minimal regret in online RL, the …

被引用次数：21 相关文章所有 3 个版本

[PDF] neurips.cc

Reward-agnostic fine-tuning: Provable statistical benefits of hybrid reinforcement learning

G Li, W Zhan, JD Lee, Y Chi… - Advances in Neural …, 2024 - proceedings.neurips.cc

This paper studies tabular reinforcement learning (RL) in the hybrid setting, which assumes
access to both an offline dataset and online interactions with the unknown environment. A …

被引用次数：13 相关文章所有 9 个版本

高级搜索

QQ 群