On Convergence of Average-Reward Q-Learning in Weakly Communicating Markov Decision Processes

Y Wan, H Yu, RS Sutton - arXiv preprint arXiv:2408.16262, 2024 - arxiv.org
This paper analyzes reinforcement learning (RL) algorithms for Markov decision processes
(MDPs) under the average-reward criterion. We focus on Q-learning algorithms based on …

Iterative Option Discovery for Planning, by Planning

K Young, RS Sutton - arXiv preprint arXiv:2310.01569, 2023 - arxiv.org
Discovering useful temporal abstractions, in the form of options, is widely thought to be key
to applying reinforcement learning and planning to increasingly complex domains. Building …

Learning and Planning with the Average-Reward Formulation

Y Wan - 2023 - era.library.ualberta.ca
The average-reward formulation is a natural and important formulation of learning and
planning problems, yet has received much less attention than the episodic and discounted …

[PDF][PDF] Autonomous Skill Acquisition for Robots Using Graduated Learning

G Vasan - Proceedings of the 23rd International Conference on …, 2024 - ifaamas.org
Skill acquisition is among the most remarkable aspects of human intelligence. It involves
discovering purposeful behavioural modules, retaining them as skills, honing them through …

Goal Space Planning with Reward Shaping

K Roice - 2024 - era.library.ualberta.ca
Planning and goal-conditioned reinforcement learning aim to create more efficient and
scalable methods for complex, long-horizon tasks. These approaches break tasks into …