查看文章

arxiv.org 中的 [PDF]

On function approximation in reinforcement learning: Optimism in the face of large state spaces

作者

Zhuoran Yang, Chi Jin, Zhaoran Wang, Mengdi Wang, Michael I Jordan

发表日期

2020/11/9

期刊

arXiv preprint arXiv:2011.04622

简介

The classical theory of reinforcement learning (RL) has focused on tabular and linear representations of value functions. Further progress hinges on combining RL with modern function approximators such as kernel functions and deep neural networks, and indeed there have been many empirical successes that have exploited such combinations in large-scale applications. There are profound challenges, however, in developing a theory to support this enterprise, most notably the need to take into consideration the exploration-exploitation tradeoff at the core of RL in conjunction with the computational and statistical tradeoffs that arise in modern function-approximation-based learning systems. We approach these challenges by studying an optimistic modification of the least-squares value iteration algorithm, in the context of the action-value function represented by a kernel function or an overparameterized neural network. We establish both polynomial runtime complexity and polynomial sample complexity for this algorithm, without additional assumptions on the data-generating model. In particular, we prove that the algorithm incurs an regret, where characterizes the intrinsic complexity of the function class , is the length of each episode, and is the total number of episodes. Our regret bounds are independent of the number of states, a result which exhibits clearly the benefit of function approximation in RL.

引用总数

被引用次数：115

202020212022202320241 25 34 32 23

学术搜索中的文章

On function approximation in reinforcement learning: Optimism in the face of large state spaces

Z Yang, C Jin, Z Wang, M Wang, MI Jordan - arXiv preprint arXiv:2011.04622, 2020

被引用次数：77 相关文章所有 7 个版本

Provably efficient reinforcement learning with kernel and neural function approximations*

Z Yang, C Jin, Z Wang, M Wang, M Jordan - Advances in Neural Information Processing Systems, 2020

被引用次数：47 相关文章所有 2 个版本