- 学术资源搜索

Is Q-learning minimax optimal? a tight sample complexity analysis

G Li, C Cai, Y Chen, Y Wei, Y Chi - Operations Research, 2024 - pubsonline.informs.org

Q-learning, which seeks to learn the optimal Q-function of a Markov decision process (MDP)
in a model-free fashion, lies at the heart of reinforcement learning. When it comes to the …

被引用次数：77 相关文章所有 11 个版本

[PDF] ieee.org

The efficacy of pessimism in asynchronous Q-learning

Y Yan, G Li, Y Chen, J Fan - IEEE Transactions on Information …, 2023 - ieeexplore.ieee.org

This paper is concerned with the asynchronous form of Q-learning, which applies a
stochastic approximation scheme to Markovian data samples. Motivated by the recent …

被引用次数：47 相关文章所有 8 个版本

[PDF] neurips.cc

Breaking the sample complexity barrier to regret-optimal model-free reinforcement learning

G Li, L Shi, Y Chen, Y Gu, Y Chi - Advances in Neural …, 2021 - proceedings.neurips.cc

Achieving sample efficiency in online episodic reinforcement learning (RL) requires
optimally balancing exploration and exploitation. When it comes to a finite-horizon episodic …

被引用次数：47 相关文章所有 13 个版本

[PDF] neurips.cc

Sample complexity of asynchronous Q-learning: Sharper analysis and variance reduction

G Li, Y Wei, Y Chi, Y Gu… - Advances in neural …, 2020 - proceedings.neurips.cc

Asynchronous Q-learning aims to learn the optimal action-value function (or Q-function) of a
Markov decision process (MDP), based on a single trajectory of Markovian samples induced …

被引用次数：90 相关文章所有 14 个版本

[PDF] mlr.press

Ucb momentum q-learning: Correcting the bias without forgetting

P Ménard, OD Domingues, X Shang… - … on Machine Learning, 2021 - proceedings.mlr.press

Abstract We propose UCBMQ, Upper Confidence Bound Momentum Q-learning, a new
algorithm for reinforcement learning in tabular and possibly stage-dependent, episodic …

被引用次数：42 相关文章所有 13 个版本

[PDF] ieee.org

Sample complexity of asynchronous Q-learning: Sharper analysis and variance reduction

G Li, Y Wei, Y Chi, Y Gu, Y Chen - IEEE Transactions on …, 2021 - ieeexplore.ieee.org

Asynchronous Q-learning aims to learn the optimal action-value function (or Q-function) of a
Markov decision process (MDP), based on a single trajectory of Markovian samples induced …

被引用次数：26 相关文章所有 5 个版本

[PDF] neurips.cc

Finite-time analysis for double Q-learning

H Xiong, L Zhao, Y Liang… - Advances in neural …, 2020 - proceedings.neurips.cc

Although Q-learning is one of the most successful algorithms for finding the best action-
value function (and thus the optimal policy) in reinforcement learning, its implementation …

被引用次数：33 相关文章所有 8 个版本

[PDF] arxiv.org

Online target q-learning with reverse experience replay: Efficiently finding the optimal policy for linear mdps

N Agarwal, S Chaudhuri, P Jain, D Nagaraj… - arXiv preprint arXiv …, 2021 - arxiv.org

Q-learning is a popular Reinforcement Learning (RL) algorithm which is widely used in
practice with function approximation (Mnih et al., 2015). In contrast, existing theoretical …

被引用次数：23 相关文章所有 5 个版本

[PDF] neurips.cc

Accelerating value iteration with anchoring

J Lee, E Ryu - Advances in Neural Information Processing …, 2024 - proceedings.neurips.cc

Value Iteration (VI) is foundational to the theory and practice of modern reinforcement
learning, and it is known to converge at a $\mathcal {O}(\gamma^ k) $-rate. Surprisingly …

被引用次数：2 相关文章所有 9 个版本

[PDF] oup.com

Breaking the sample complexity barrier to regret-optimal model-free reinforcement learning

G Li, L Shi, Y Chen, Y Chi - … and Inference: A Journal of the IMA, 2023 - academic.oup.com

Achieving sample efficiency in online episodic reinforcement learning (RL) requires
optimally balancing exploration and exploitation. When it comes to a finite-horizon episodic …

被引用次数：5 相关文章所有 5 个版本

高级搜索

QQ 群

Is Q-learning minimax optimal? a tight sample complexity analysis

The efficacy of pessimism in asynchronous Q-learning

Breaking the sample complexity barrier to regret-optimal model-free reinforcement learning

Sample complexity of asynchronous Q-learning: Sharper analysis and variance reduction

Ucb momentum q-learning: Correcting the bias without forgetting

Sample complexity of asynchronous Q-learning: Sharper analysis and variance reduction

Finite-time analysis for double Q-learning

Online target q-learning with reverse experience replay: Efficiently finding the optimal policy for linear mdps

Accelerating value iteration with anchoring

Breaking the sample complexity barrier to regret-optimal model-free reinforcement learning

引用