An introduction to reinforcement learning theory: Value function methods

J Peters, S Schaal - Neurocomputing, 2008 - Elsevier

In this paper, we suggest a novel reinforcement learning architecture, the Natural Actor-
Critic. The actor updates are achieved using stochastic policy gradients employing Amari's …

被引用次数：1127 相关文章所有 15 个版本

[PDF] tu-darmstadt.de

Natural actor-critic

J Peters, S Vijayakumar, S Schaal - … Learning, Porto, Portugal, October 3-7 …, 2005 - Springer

This paper investigates a novel model-free reinforcement learning architecture, the Natural
Actor-Critic. The actor updates are based on stochastic policy gradients employing Amari's …

被引用次数：486 相关文章所有 25 个版本

[PDF] academia.edu

Stable reinforcement learning with autoencoders for tactile and visual data

H Van Hoof, N Chen, M Karl… - 2016 IEEE/RSJ …, 2016 - ieeexplore.ieee.org

For many tasks, tactile or visual feedback is helpful or even crucial. However, designing
controllers that take such high-dimensional feedback into account is non-trivial. Therefore …

被引用次数：179 相关文章所有 10 个版本

[PDF] jmlr.org

[PDF][PDF] Dynamic policy programming

MG Azar, V Gómez, HJ Kappen - The Journal of Machine Learning …, 2012 - jmlr.org

In this paper, we propose a novel policy iteration method, called dynamic policy
programming (DPP), to estimate the optimal policy in the infinite-horizon Markov decision …

被引用次数：150 相关文章所有 17 个版本

Automatic collective motion tuning using actor-critic deep reinforcement learning

S Abpeikar, K Kasmarik, M Garratt, R Hunjet… - Swarm and Evolutionary …, 2022 - Elsevier

Collective behaviours such as swarm formation of autonomous agents offer the advantages
of efficient movement, redundancy, and potential for human guidance of a single swarm …

被引用次数：15 相关文章

[PDF] mpg.de

[图书][B] Machine learning of motor skills for robotics

JR Peters - 2007 - search.proquest.com

Autonomous robots that can assist humans in situations of daily life have been a long
standing vision of robotics, artificial intelligence, and cognitive sciences. In this thesis, we …

被引用次数：103 相关文章所有 12 个版本

[PDF] mlr.press

Dynamic policy programming with function approximation

MG Azar, V Gómez, B Kappen - Proceedings of the …, 2011 - proceedings.mlr.press

In this paper, we consider the problem of planning in the infinite-horizon discounted-reward
Markov decision problems. We propose a novel iterative method, called dynamic policy …

被引用次数：53 相关文章所有 11 个版本

[PDF] jmlr.org

Non-parametric policy search with limited information loss

H Van Hoof, G Neumann, J Peters - Journal of Machine Learning Research, 2017 - jmlr.org

Learning complex control policies from non-linear and redundant sensory input is an
important challenge for reinforcement learning algorithms. Non-parametric methods that …

被引用次数：28 相关文章所有 14 个版本

[HTML] cjig.cn

[HTML][HTML] 强化学习的自动驾驶控制技术研究进展

潘峰，鲍泓 - 2021 - cjig.cn

摘要自动驾驶车辆的本质是轮式移动机器人, 是一个集模式识别, 环境感知,
规划决策和智能控制等功能于一体的综合系统. 人工智能和机器学习领域的进步极大推动了自动 …

被引用次数：4 相关文章所有 4 个版本

[PDF] arxiv.org

Implicit Two-Tower Policies

Y Zhao, Q Pan, K Choromanski, D Jain… - arXiv preprint arXiv …, 2022 - arxiv.org

We present a new class of structured reinforcement learning policy-architectures, Implicit
Two-Tower (ITT) policies, where the actions are chosen based on the attention scores of …

被引用次数：1 相关文章所有 4 个版本

高级搜索

QQ 群