Natural actor-critic

J Peters, S Schaal - Neurocomputing, 2008 - Elsevier
In this paper, we suggest a novel reinforcement learning architecture, the Natural Actor-
Critic. The actor updates are achieved using stochastic policy gradients employing Amari's …

Natural actor-critic

J Peters, S Vijayakumar, S Schaal - … Learning, Porto, Portugal, October 3-7 …, 2005 - Springer
This paper investigates a novel model-free reinforcement learning architecture, the Natural
Actor-Critic. The actor updates are based on stochastic policy gradients employing Amari's …

Stable reinforcement learning with autoencoders for tactile and visual data

H Van Hoof, N Chen, M Karl… - 2016 IEEE/RSJ …, 2016 - ieeexplore.ieee.org
For many tasks, tactile or visual feedback is helpful or even crucial. However, designing
controllers that take such high-dimensional feedback into account is non-trivial. Therefore …

[PDF][PDF] Dynamic policy programming

MG Azar, V Gómez, HJ Kappen - The Journal of Machine Learning …, 2012 - jmlr.org
In this paper, we propose a novel policy iteration method, called dynamic policy
programming (DPP), to estimate the optimal policy in the infinite-horizon Markov decision …

Automatic collective motion tuning using actor-critic deep reinforcement learning

S Abpeikar, K Kasmarik, M Garratt, R Hunjet… - Swarm and Evolutionary …, 2022 - Elsevier
Collective behaviours such as swarm formation of autonomous agents offer the advantages
of efficient movement, redundancy, and potential for human guidance of a single swarm …

[图书][B] Machine learning of motor skills for robotics

JR Peters - 2007 - search.proquest.com
Autonomous robots that can assist humans in situations of daily life have been a long
standing vision of robotics, artificial intelligence, and cognitive sciences. In this thesis, we …

Dynamic policy programming with function approximation

MG Azar, V Gómez, B Kappen - Proceedings of the …, 2011 - proceedings.mlr.press
In this paper, we consider the problem of planning in the infinite-horizon discounted-reward
Markov decision problems. We propose a novel iterative method, called dynamic policy …

Non-parametric policy search with limited information loss

H Van Hoof, G Neumann, J Peters - Journal of Machine Learning Research, 2017 - jmlr.org
Learning complex control policies from non-linear and redundant sensory input is an
important challenge for reinforcement learning algorithms. Non-parametric methods that …

[HTML][HTML] 强化学习的自动驾驶控制技术研究进展

潘峰, 鲍泓 - 2021 - cjig.cn
摘要自动驾驶车辆的本质是轮式移动机器人, 是一个集模式识别, 环境感知,
规划决策和智能控制等功能于一体的综合系统. 人工智能和机器学习领域的进步极大推动了自动 …

Implicit Two-Tower Policies

Y Zhao, Q Pan, K Choromanski, D Jain… - arXiv preprint arXiv …, 2022 - arxiv.org
We present a new class of structured reinforcement learning policy-architectures, Implicit
Two-Tower (ITT) policies, where the actions are chosen based on the attention scores of …