Compositional planning using optimal option models

M Hutsebaut-Buysse, K Mets, S Latré - Machine Learning and Knowledge …, 2022 - mdpi.com

Reinforcement learning (RL) allows an agent to solve sequential decision-making problems
by interacting with an environment in a trial-and-error fashion. When these environments are …

被引用次数：107 相关文章所有 8 个版本

[HTML] nih.gov

The algorithmic anatomy of model-based evaluation

ND Daw, P Dayan - … Transactions of the Royal Society B …, 2014 - royalsocietypublishing.org

Despite many debates in the first half of the twentieth century, it is now largely a truism that
humans and other animals build models of their environments and use them for prediction …

被引用次数：283 相关文章所有 15 个版本

[PDF] arxiv.org

Reinforcement learning with unsupervised auxiliary tasks

M Jaderberg, V Mnih, WM Czarnecki, T Schaul… - arXiv preprint arXiv …, 2016 - arxiv.org

Deep reinforcement learning agents have achieved state-of-the-art results by directly
maximising cumulative reward. However, environments contain a much wider variety of …

被引用次数：1466 相关文章所有 7 个版本

[PDF] aaai.org

The option-critic architecture

PL Bacon, J Harb, D Precup - Proceedings of the AAAI conference on …, 2017 - ojs.aaai.org

Temporal abstraction is key to scaling up learning and planning in reinforcement learning.
While planning with temporally extended actions is well understood, creating such …

被引用次数：1341 相关文章所有 14 个版本

[PDF] jair.org

From skills to symbols: Learning symbolic representations for abstract high-level planning

G Konidaris, LP Kaelbling, T Lozano-Perez - Journal of Artificial Intelligence …, 2018 - jair.org

We consider the problem of constructing abstract representations for planning in high-
dimensional, continuous environments. We assume an agent equipped with a collection of …

被引用次数：372 相关文章所有 8 个版本

[PDF] arxiv.org

Variational intrinsic control

K Gregor, DJ Rezende, D Wierstra - arXiv preprint arXiv:1611.07507, 2016 - arxiv.org

In this paper we introduce a new unsupervised reinforcement learning method for
discovering the set of intrinsic options available to an agent. This set is learned by …

被引用次数：482 相关文章所有 3 个版本

[PDF] openreview.net

Option discovery using deep skill chaining

A Bagaria, G Konidaris - International Conference on Learning …, 2019 - openreview.net

Autonomously discovering temporally extended actions, or skills, is a longstanding goal of
hierarchical reinforcement learning. We propose a new algorithm that combines skill …

被引用次数：135 相关文章所有 10 个版本

[PDF] royalsocietypublishing.org

Model-based hierarchical reinforcement learning and human action control

M Botvinick, A Weinstein - Philosophical Transactions of …, 2014 - royalsocietypublishing.org

Recent work has reawakened interest in goal-directed or 'model-based'choice, where
decisions are based on prospective evaluation of potential action outcomes. Concurrently …

被引用次数：230 相关文章所有 7 个版本

[PDF] springer.com

Probabilistic inference for determining options in reinforcement learning

C Daniel, H Van Hoof, J Peters, G Neumann - Machine Learning, 2016 - Springer

Tasks that require many sequential decisions or complex solutions are hard to solve using
conventional reinforcement learning algorithms. Based on the semi Markov decision …

被引用次数：153 相关文章所有 18 个版本

[PDF] neurips.cc

Learning abstract options

M Riemer, M Liu, G Tesauro - Advances in neural …, 2018 - proceedings.neurips.cc

Building systems that autonomously create temporal abstractions from data is a key
challenge in scaling learning and planning in reinforcement learning. One popular approach …

被引用次数：101 相关文章所有 7 个版本

高级搜索

QQ 群