This paper investigates a novel model-free reinforcement learning architecture, the Natural Actor-Critic. The actor updates are based on stochastic policy gradients employing Amari's …
For many tasks, tactile or visual feedback is helpful or even crucial. However, designing controllers that take such high-dimensional feedback into account is non-trivial. Therefore …
In this paper, we propose a novel policy iteration method, called dynamic policy programming (DPP), to estimate the optimal policy in the infinite-horizon Markov decision …
Collective behaviours such as swarm formation of autonomous agents offer the advantages of efficient movement, redundancy, and potential for human guidance of a single swarm …
Autonomous robots that can assist humans in situations of daily life have been a long standing vision of robotics, artificial intelligence, and cognitive sciences. In this thesis, we …
In this paper, we consider the problem of planning in the infinite-horizon discounted-reward Markov decision problems. We propose a novel iterative method, called dynamic policy …
Learning complex control policies from non-linear and redundant sensory input is an important challenge for reinforcement learning algorithms. Non-parametric methods that …
We present a new class of structured reinforcement learning policy-architectures, Implicit Two-Tower (ITT) policies, where the actions are chosen based on the attention scores of …