We revisit the standard formulation of tabular actor-critic algorithm as a two time-scale stochastic approximation with value function computed on a faster time-scale and policy …
Y Luo, J Hu, A Gosavi - Annals of Operations Research, 2024 - Springer
Reinforcement learning (RL) is an exciting area within the domain of Markov Decision Processes (MDPs) in which the underlying optimization problem is solved either in a …
In this letter, we derive a generalization of the Speedy Q-learning (SQL) algorithm that was proposed in the Reinforcement Learning (RL) literature to handle slow convergence of …
A large number of systems are well-modeled by Markov Decision Processes (MDPs). In particular, certain wireless communication networks and biological networks admit such …
A Gopalan, G Thoppe - arXiv preprint arXiv:2205.13617, 2022 - arxiv.org
Q-learning and SARSA with $\epsilon $-greedy exploration are leading reinforcement learning methods. Their tabular forms converge to the optimal Q-function under reasonable …
A Gopalan, G Thoppe - ICML 2024 Workshop: Aligning Reinforcement … - openreview.net
For a Reinforcement Learning (RL) algorithm to be practically useful, the policy it estimates in the limit must be superior to the initial guess, at least on average. In this work, we show …
Q-learning and SARSA with ϵ-greedy exploration are leading reinforcement learning methods. Their tabular forms converge to the optimal Q-function under reasonable …
Q-learning and SARSA (0) with $\epsilon $-greedy exploration are leading reinforcement learning methods, and their tabular forms converge to the optimal Q-function under …