作者
Gen Li, Yuting Wei, Yuejie Chi, Yuxin Chen
发表日期
2024/1
期刊
Operations Research
卷号
72
期号
1
页码范围
203-221
出版商
INFORMS
简介
This paper is concerned with the sample efficiency of reinforcement learning, assuming access to a generative model (or simulator). We first consider γ-discounted infinite-horizon Markov decision processes (MDPs) with state space and action space . Despite a number of prior works tackling this problem, a complete picture of the trade-offs between sample complexity and statistical accuracy has yet to be determined. In particular, all prior results suffer from a severe sample size barrier in the sense that their claimed statistical guarantees hold only when the sample size exceeds at least . The current paper overcomes this barrier by certifying the minimax optimality of two algorithms—a perturbed model-based algorithm and a conservative model-based algorithm—as soon as the sample size exceeds the order of (modulo some log factor). Moving beyond infinite-horizon MDPs, we further study time …
学术搜索中的文章
G Li, Y Wei, Y Chi, Y Gu, Y Chen - Advances in neural information processing systems, 2020
G Li, Y Wei, Y Chi, Y Chen - Operations Research, 2024