Average-reward learning and planning with options

D Abel, MK Ho, A Harutyunyan - arXiv preprint arXiv:2407.10583, 2024 - arxiv.org

Modern reinforcement learning has been conditioned by at least three dogmas. The first is
the environment spotlight, which refers to our tendency to focus on modeling environments …

被引用次数：2 相关文章所有 6 个版本

[PDF] mlr.press

Off-policy average reward actor-critic with deterministic policy search

N Saxena, S Khastagir, S Kolathaya… - International …, 2023 - proceedings.mlr.press

The average reward criterion is relatively less studied as most existing works in the
Reinforcement Learning literature consider the discounted reward criterion. There are few …

被引用次数：5 相关文章所有 12 个版本

[PDF] arxiv.org

On Convergence of Average-Reward Q-Learning in Weakly Communicating Markov Decision Processes

Y Wan, H Yu, RS Sutton - arXiv preprint arXiv:2408.16262, 2024 - arxiv.org

This paper analyzes reinforcement learning (RL) algorithms for Markov decision processes
(MDPs) under the average-reward criterion. We focus on Q-learning algorithms based on …

被引用次数：1 相关文章所有 2 个版本

[PDF] arxiv.org

On convergence of average-reward off-policy control algorithms in weakly communicating MDPs

Y Wan, RS Sutton - arXiv preprint arXiv:2209.15141, 2022 - arxiv.org

We show two average-reward off-policy control algorithms, Differential Q-learning (Wan,
Naik, & Sutton 2021a) and RVI Q-learning (Abounadi Bertsekas & Borkar 2001), converge in …

被引用次数：3 相关文章所有 6 个版本

[PDF] jmlr.org

Goal-space planning with subgoal models

C Lo, K Roice, PM Panahi, SM Jordan, A White… - Journal of Machine …, 2024 - jmlr.org

This paper investigates a new approach to model-based reinforcement learning using
background planning: mixing (approximate) dynamic programming updates and model-free …

被引用次数：3 相关文章所有 7 个版本

[PDF] arxiv.org

A New View on Planning in Online Reinforcement Learning

K Roice, PM Panahi, SM Jordan, A White… - arXiv preprint arXiv …, 2024 - arxiv.org

This paper investigates a new approach to model-based reinforcement learning using
background planning: mixing (approximate) dynamic programming updates and model-free …

A Note on Stability in Asynchronous Stochastic Approximation without Communication Delays

H Yu, Y Wan, RS Sutton - arXiv preprint arXiv:2312.15091, 2023 - arxiv.org

In this paper, we study asynchronous stochastic approximation algorithms without
communication delays. Our main contribution is a stability proof for these algorithms that …

被引用次数：2 相关文章所有 2 个版本

高级搜索

QQ 群