The average reward criterion is relatively less studied as most existing works in the Reinforcement Learning literature consider the discounted reward criterion. There are few …
Y Wan, H Yu, RS Sutton - arXiv preprint arXiv:2408.16262, 2024 - arxiv.org
This paper analyzes reinforcement learning (RL) algorithms for Markov decision processes (MDPs) under the average-reward criterion. We focus on Q-learning algorithms based on …
We show two average-reward off-policy control algorithms, Differential Q-learning (Wan, Naik, & Sutton 2021a) and RVI Q-learning (Abounadi Bertsekas & Borkar 2001), converge in …
This paper investigates a new approach to model-based reinforcement learning using background planning: mixing (approximate) dynamic programming updates and model-free …
This paper investigates a new approach to model-based reinforcement learning using background planning: mixing (approximate) dynamic programming updates and model-free …
H Yu, Y Wan, RS Sutton - arXiv preprint arXiv:2409.03915, 2024 - arxiv.org
This paper studies asynchronous stochastic approximation (SA) algorithms and their application to reinforcement learning in semi-Markov decision processes (SMDPs) with an …
We introduce a novel approach to hierarchical reinforcement learning for Linearly-solvable Markov Decision Processes (LMDPs) in the infinite-horizon average-reward setting. Unlike …
The average-reward formulation is a natural and important formulation of learning and planning problems, yet has received much less attention than the episodic and discounted …
H Yu, Y Wan, RS Sutton - arXiv preprint arXiv:2312.15091, 2023 - arxiv.org
In this paper, we study asynchronous stochastic approximation algorithms without communication delays. Our main contribution is a stability proof for these algorithms that …