Toward discovering options that achieve faster planning

文章

学术资源搜索

获得 5 条结果（用时0.02秒）

我的图书馆

Toward discovering options that achieve faster planning

在引用文章中搜索

[PDF] arxiv.org

On Convergence of Average-Reward Q-Learning in Weakly Communicating Markov Decision Processes

Y Wan, H Yu, RS Sutton - arXiv preprint arXiv:2408.16262, 2024 - arxiv.org

This paper analyzes reinforcement learning (RL) algorithms for Markov decision processes
(MDPs) under the average-reward criterion. We focus on Q-learning algorithms based on …

被引用次数：1 相关文章所有 2 个版本

[PDF] arxiv.org

Iterative Option Discovery for Planning, by Planning

K Young, RS Sutton - arXiv preprint arXiv:2310.01569, 2023 - arxiv.org

Discovering useful temporal abstractions, in the form of options, is widely thought to be key
to applying reinforcement learning and planning to increasingly complex domains. Building …

Learning and Planning with the Average-Reward Formulation

Y Wan - 2023 - era.library.ualberta.ca

The average-reward formulation is a natural and important formulation of learning and
planning problems, yet has received much less attention than the episodic and discounted …

被引用次数：2 相关文章所有 3 个版本

[PDF] ifaamas.org

[PDF][PDF] Autonomous Skill Acquisition for Robots Using Graduated Learning

G Vasan - Proceedings of the 23rd International Conference on …, 2024 - ifaamas.org

Skill acquisition is among the most remarkable aspects of human intelligence. It involves
discovering purposeful behavioural modules, retaining them as skills, honing them through …

Goal Space Planning with Reward Shaping

K Roice - 2024 - era.library.ualberta.ca

Planning and goal-conditioned reinforcement learning aim to create more efficient and
scalable methods for complex, long-horizon tasks. These approaches break tasks into …

高级搜索

QQ 群

Toward discovering options that achieve faster planning

On Convergence of Average-Reward Q-Learning in Weakly Communicating Markov Decision Processes

Iterative Option Discovery for Planning, by Planning

Learning and Planning with the Average-Reward Formulation

[PDF][PDF] Autonomous Skill Acquisition for Robots Using Graduated Learning

Goal Space Planning with Reward Shaping

引用