A distributional analysis of sampling-based reinforcement learning algorithms

P Amortila, D Precup, P Panangaden… - International …, 2020 - proceedings.mlr.press
We present a distributional approach to theoretical analyses of reinforcement learning
algorithms for constant step-sizes. We demonstrate its effectiveness by presenting simple …

Faster Near-Optimal Reinforcement Learning: Adding Adaptiveness to the E3 Algorithm

C Domingo - International Conference on Algorithmic Learning …, 1999 - Springer
Abstract Recently, Kearns and Singh presented the first provably efficient and near-optimal
algorithm for reinforcement learning in general Markov decision processes. One of the key …

[PDF][PDF] Policy gradient vs. value function approximation: A reinforcement learning shootout

J Beitelspacher, J Fager, G Henriques… - School of Computer …, 2006 - mcgovern-fagg.org
This paper compares the performance of policy gradient techniques with traditional value
function approximation methods for reinforcement learning in a difficult problem domain. We …

Q-learning with uniformly bounded variance

AM Devraj, SP Meyn - IEEE Transactions on Automatic Control, 2021 - ieeexplore.ieee.org
Sample complexity bounds are a common performance metric in the reinforcement learning
literature. In the discounted cost, infinite horizon setting, all of the known bounds can be …

[PDF][PDF] Direct gradient-based reinforcement learning: I. gradient estimation algorithms

J Baxter, PL Bartlett - 1999 - Citeseer
Despite their many empirical successes, approximate value-function based approaches to
reinforcement learning suffer from a paucity of theoretical guarantees on the performance of …

On the convergence of policy iteration-based reinforcement learning with monte carlo policy evaluation

A Winnicki, R Srikant - International Conference on Artificial …, 2023 - proceedings.mlr.press
A common technique in reinforcement learning is to evaluate the value function from Monte
Carlo simulations of a given policy, and use the estimated value function to obtain a new …

The optimal reward baseline for gradient-based reinforcement learning

L Weaver, N Tao - arXiv preprint arXiv:1301.2315, 2013 - arxiv.org
There exist a number of reinforcement learning algorithms which learnby climbing the
gradient of expected reward. Their long-runconvergence has been proved, even in partially …

A unified analysis of value-function-based reinforcement-learning algorithms

C Szepesvári, ML Littman - Neural computation, 1999 - direct.mit.edu
Reinforcement learning is the problem of generating optimal behavior in a sequential
decision-making environment given the opportunity of interacting with it. Many algorithms for …

[PDF][PDF] Reinforcement learning in a nutshell.

V Heidrich-Meisner, M Lauer, C Igel, MA Riedmiller - ESANN, 2007 - homepage.rub.de
Reinforcement Learning in a Nutshell Page 1 Reinforcement Learning in a Nutshell V.
Heidrich-Meisner1, M. Lauer2, C. Igel1 and M. Riedmiller2 1- Institut für Neuroinformatik …

An analysis of reinforcement learning with function approximation

FS Melo, SP Meyn, MI Ribeiro - … of the 25th international conference on …, 2008 - dl.acm.org
We address the problem of computing the optimal Q-function in Markov decision problems
with infinite state-space. We analyze the convergence properties of several variations of Q …