作者
Gen Li, Yuting Wei, Yuejie Chi, Yuxin Chen
发表日期
2024/1
期刊
Operations Research
卷号
72
期号
1
页码范围
203-221
出版商
INFORMS
简介
This paper is concerned with the sample efficiency of reinforcement learning, assuming access to a generative model (or simulator). We first consider γ-discounted infinite-horizon Markov decision processes (MDPs) with state space and action space . Despite a number of prior works tackling this problem, a complete picture of the trade-offs between sample complexity and statistical accuracy has yet to be determined. In particular, all prior results suffer from a severe sample size barrier in the sense that their claimed statistical guarantees hold only when the sample size exceeds at least . The current paper overcomes this barrier by certifying the minimax optimality of two algorithms—a perturbed model-based algorithm and a conservative model-based algorithm—as soon as the sample size exceeds the order of (modulo some log factor). Moving beyond infinite-horizon MDPs, we further study time …
引用总数
2019202020212022202320241434334323