Three dogmas of reinforcement learning

D Abel, MK Ho, A Harutyunyan - arXiv preprint arXiv:2407.10583, 2024 - arxiv.org
Modern reinforcement learning has been conditioned by at least three dogmas. The first is
the environment spotlight, which refers to our tendency to focus on modeling environments …

Off-policy average reward actor-critic with deterministic policy search

N Saxena, S Khastagir, S Kolathaya… - International …, 2023 - proceedings.mlr.press
The average reward criterion is relatively less studied as most existing works in the
Reinforcement Learning literature consider the discounted reward criterion. There are few …

On Convergence of Average-Reward Q-Learning in Weakly Communicating Markov Decision Processes

Y Wan, H Yu, RS Sutton - arXiv preprint arXiv:2408.16262, 2024 - arxiv.org
This paper analyzes reinforcement learning (RL) algorithms for Markov decision processes
(MDPs) under the average-reward criterion. We focus on Q-learning algorithms based on …

On convergence of average-reward off-policy control algorithms in weakly communicating MDPs

Y Wan, RS Sutton - arXiv preprint arXiv:2209.15141, 2022 - arxiv.org
We show two average-reward off-policy control algorithms, Differential Q-learning (Wan,
Naik, & Sutton 2021a) and RVI Q-learning (Abounadi Bertsekas & Borkar 2001), converge in …

Goal-space planning with subgoal models

C Lo, K Roice, PM Panahi, SM Jordan, A White… - Journal of Machine …, 2024 - jmlr.org
This paper investigates a new approach to model-based reinforcement learning using
background planning: mixing (approximate) dynamic programming updates and model-free …

A New View on Planning in Online Reinforcement Learning

K Roice, PM Panahi, SM Jordan, A White… - arXiv preprint arXiv …, 2024 - arxiv.org
This paper investigates a new approach to model-based reinforcement learning using
background planning: mixing (approximate) dynamic programming updates and model-free …

Asynchronous Stochastic Approximation and Average-Reward Reinforcement Learning

H Yu, Y Wan, RS Sutton - arXiv preprint arXiv:2409.03915, 2024 - arxiv.org
This paper studies asynchronous stochastic approximation (SA) algorithms and their
application to reinforcement learning in semi-Markov decision processes (SMDPs) with an …

Hierarchical Average-Reward Linearly-Solvable Markov Decision Processes

G Infante, A Jonsson, V Gómez - ECAI 2024, 2024 - ebooks.iospress.nl
We introduce a novel approach to hierarchical reinforcement learning for Linearly-solvable
Markov Decision Processes (LMDPs) in the infinite-horizon average-reward setting. Unlike …

Learning and Planning with the Average-Reward Formulation

Y Wan - 2023 - era.library.ualberta.ca
The average-reward formulation is a natural and important formulation of learning and
planning problems, yet has received much less attention than the episodic and discounted …

A Note on Stability in Asynchronous Stochastic Approximation without Communication Delays

H Yu, Y Wan, RS Sutton - arXiv preprint arXiv:2312.15091, 2023 - arxiv.org
In this paper, we study asynchronous stochastic approximation algorithms without
communication delays. Our main contribution is a stability proof for these algorithms that …