Recent advances in hierarchical reinforcement learning

AG Barto, S Mahadevan - Discrete event dynamic systems, 2003 - Springer
Reinforcement learning is bedeviled by the curse of dimensionality: the number of
parameters to be learned grows exponentially with the size of any compact encoding of a …

Hierarchically organized behavior and its neural foundations: A reinforcement learning perspective

MM Botvinick, Y Niv, AG Barto - cognition, 2009 - Elsevier
Research on human and animal behavior has long emphasized its hierarchical structure—
the divisibility of ongoing behavior into discrete tasks, which are comprised of subtask …

Zero-shot task generalization with multi-task deep reinforcement learning

J Oh, S Singh, H Lee, P Kohli - International Conference on …, 2017 - proceedings.mlr.press
As a step towards developing zero-shot task generalization capabilities in reinforcement
learning (RL), we introduce a new RL problem where the agent should learn to execute …

[HTML][HTML] Intrinsically motivated reinforcement learning

N Chentanez, A Barto, S Singh - Advances in neural …, 2004 - proceedings.neurips.cc
Psychologists call behavior intrinsically motivated when it is engaged in for its own sake
rather than as a step toward solving a specific problem of clear practical value. But what we …

Habits, action sequences and reinforcement learning

A Dezfouli, BW Balleine - European Journal of Neuroscience, 2012 - Wiley Online Library
It is now widely accepted that instrumental actions can be either goal‐directed or habitual;
whereas the former are rapidly acquired and regulated by their outcome, the latter are …

[PDF][PDF] Intrinsically motivated learning of hierarchical collections of skills

AG Barto, S Singh, N Chentanez - Proceedings of the 3rd International …, 2004 - Citeseer
Humans and other animals often engage in activities for their own sakes rather than as steps
toward solving practical problems. Psychologists call these intrinsically motivated behaviors …

[PDF][PDF] Proto-value Functions: A Laplacian Framework for Learning Representation and Control in Markov Decision Processes.

S Mahadevan, M Maggioni - Journal of Machine Learning Research, 2007 - jmlr.org
This paper introduces a novel spectral framework for solving Markov decision processes
(MDPs) by jointly learning representations and optimal policies. The major components of …

Using relative novelty to identify useful temporal abstractions in reinforcement learning

Ö Şimşek, AG Barto - Proceedings of the twenty-first international …, 2004 - dl.acm.org
We present a new method for automatically creating useful temporal abstractions in
reinforcement learning. We argue that states that allow the agent to transition to a different …

Skill discovery for exploration and planning using deep skill graphs

A Bagaria, JK Senthil… - … Conference on Machine …, 2021 - proceedings.mlr.press
We introduce a new skill-discovery algorithm that builds a discrete graph representation of
large continuous MDPs, where nodes correspond to skill subgoals and the edges to skill …

Investigating contingency awareness using Atari 2600 games

M Bellemare, J Veness, M Bowling - … of the AAAI Conference on Artificial …, 2012 - ojs.aaai.org
Contingency awareness is the recognition that some aspects of a future observation are
under an agent's control while others are solely determined by the environment. This paper …