相关文章- 学术资源搜索

A distributional analysis of sampling-based reinforcement learning algorithms

P Amortila, D Precup, P Panangaden… - International …, 2020 - proceedings.mlr.press

We present a distributional approach to theoretical analyses of reinforcement learning
algorithms for constant step-sizes. We demonstrate its effectiveness by presenting simple …

被引用次数：15 相关文章所有 10 个版本

[PDF] psu.edu

Faster Near-Optimal Reinforcement Learning: Adding Adaptiveness to the E³ Algorithm

C Domingo - International Conference on Algorithmic Learning …, 1999 - Springer

Abstract Recently, Kearns and Singh presented the first provably efficient and near-optimal
algorithm for reinforcement learning in general Markov decision processes. One of the key …

被引用次数：12 相关文章所有 12 个版本

[PDF] mcgovern-fagg.org

[PDF][PDF] Policy gradient vs. value function approximation: A reinforcement learning shootout

J Beitelspacher, J Fager, G Henriques… - School of Computer …, 2006 - mcgovern-fagg.org

This paper compares the performance of policy gradient techniques with traditional value
function approximation methods for reinforcement learning in a difficult problem domain. We …

被引用次数：12 相关文章所有 5 个版本

Q-learning with uniformly bounded variance

AM Devraj, SP Meyn - IEEE Transactions on Automatic Control, 2021 - ieeexplore.ieee.org

Sample complexity bounds are a common performance metric in the reinforcement learning
literature. In the discounted cost, infinite horizon setting, all of the known bounds can be …

被引用次数：14 相关文章所有 2 个版本

[PDF] psu.edu

[PDF][PDF] Direct gradient-based reinforcement learning: I. gradient estimation algorithms

J Baxter, PL Bartlett - 1999 - Citeseer

Despite their many empirical successes, approximate value-function based approaches to
reinforcement learning suffer from a paucity of theoretical guarantees on the performance of …

被引用次数：122 相关文章所有 8 个版本

[PDF] mlr.press

On the convergence of policy iteration-based reinforcement learning with monte carlo policy evaluation

A Winnicki, R Srikant - International Conference on Artificial …, 2023 - proceedings.mlr.press

A common technique in reinforcement learning is to evaluate the value function from Monte
Carlo simulations of a given policy, and use the estimated value function to obtain a new …

被引用次数：5 相关文章所有 5 个版本

[PDF] arxiv.org

The optimal reward baseline for gradient-based reinforcement learning

L Weaver, N Tao - arXiv preprint arXiv:1301.2315, 2013 - arxiv.org

There exist a number of reinforcement learning algorithms which learnby climbing the
gradient of expected reward. Their long-runconvergence has been proved, even in partially …

被引用次数：313 相关文章所有 10 个版本

[PDF] psu.edu

A unified analysis of value-function-based reinforcement-learning algorithms

C Szepesvári, ML Littman - Neural computation, 1999 - direct.mit.edu

Reinforcement learning is the problem of generating optimal behavior in a sequential
decision-making environment given the opportunity of interacting with it. Many algorithms for …

被引用次数：252 相关文章所有 20 个版本

[PDF] rub.de

[PDF][PDF] Reinforcement learning in a nutshell.

V Heidrich-Meisner, M Lauer, C Igel, MA Riedmiller - ESANN, 2007 - homepage.rub.de

Reinforcement Learning in a Nutshell Page 1 Reinforcement Learning in a Nutshell V.
Heidrich-Meisner1, M. Lauer2, C. Igel1 and M. Riedmiller2 1- Institut für Neuroinformatik …

被引用次数：55 相关文章所有 13 个版本

[PDF] researchgate.net

An analysis of reinforcement learning with function approximation

FS Melo, SP Meyn, MI Ribeiro - … of the 25th international conference on …, 2008 - dl.acm.org

We address the problem of computing the optimal Q-function in Markov decision problems
with infinite state-space. We analyze the convergence properties of several variations of Q …

被引用次数：326 相关文章所有 18 个版本

高级搜索

QQ 群

A distributional analysis of sampling-based reinforcement learning algorithms

Faster Near-Optimal Reinforcement Learning: Adding Adaptiveness to the E³ Algorithm

[PDF][PDF] Policy gradient vs. value function approximation: A reinforcement learning shootout

Q-learning with uniformly bounded variance

[PDF][PDF] Direct gradient-based reinforcement learning: I. gradient estimation algorithms

On the convergence of policy iteration-based reinforcement learning with monte carlo policy evaluation

The optimal reward baseline for gradient-based reinforcement learning

A unified analysis of value-function-based reinforcement-learning algorithms

[PDF][PDF] Reinforcement learning in a nutshell.

An analysis of reinforcement learning with function approximation

相关搜索

引用

A distributional analysis of sampling-based reinforcement learning algorithms

Faster Near-Optimal Reinforcement Learning: Adding Adaptiveness to the E3 Algorithm

[PDF][PDF] Policy gradient vs. value function approximation: A reinforcement learning shootout

Q-learning with uniformly bounded variance

[PDF][PDF] Direct gradient-based reinforcement learning: I. gradient estimation algorithms

On the convergence of policy iteration-based reinforcement learning with monte carlo policy evaluation

The optimal reward baseline for gradient-based reinforcement learning

A unified analysis of value-function-based reinforcement-learning algorithms

[PDF][PDF] Reinforcement learning in a nutshell.

An analysis of reinforcement learning with function approximation

相关搜索

Faster Near-Optimal Reinforcement Learning: Adding Adaptiveness to the E³ Algorithm