Expected policy gradients

K Chatzilygeroudis, V Vassiliades… - IEEE Transactions …, 2019 - ieeexplore.ieee.org

Most policy search (PS) algorithms require thousands of training episodes to find an
effective policy, which is often infeasible with a physical robot. This survey article focuses on …

被引用次数：205 相关文章所有 17 个版本

[PDF] arxiv.org

Maximum a posteriori policy optimisation

A Abdolmaleki, JT Springenberg, Y Tassa… - arXiv preprint arXiv …, 2018 - arxiv.org

We introduce a new algorithm for reinforcement learning called Maximum aposteriori Policy
Optimisation (MPO) based on coordinate ascent on a relative entropy objective. We show …

被引用次数：548 相关文章所有 4 个版本

[PDF] neurips.cc

Softmax deep double deterministic policy gradients

L Pan, Q Cai, L Huang - Advances in neural information …, 2020 - proceedings.neurips.cc

A widely-used actor-critic reinforcement learning algorithm for continuous control, Deep
Deterministic Policy Gradients (DDPG), suffers from the overestimation problem, which can …

被引用次数：107 相关文章所有 7 个版本

[PDF] neurips.cc

Actor-critic policy optimization in partially observable multiagent environments

S Srinivasan, M Lanctot, V Zambaldi… - Advances in neural …, 2018 - proceedings.neurips.cc

Optimization of parameterized policies for reinforcement learning (RL) is an important and
challenging problem in artificial intelligence. Among the most common approaches are …

被引用次数：174 相关文章所有 9 个版本

[PDF] arxiv.org

Deep active inference as variational policy gradients

B Millidge - Journal of Mathematical Psychology, 2020 - Elsevier

Active Inference is a theory arising from theoretical neuroscience which casts action and
planning as Bayesian inference problems to be solved by minimizing a single quantity—the …

被引用次数：124 相关文章所有 6 个版本

[PDF] neurips.cc

Credit assignment for collective multiagent RL with global rewards

DT Nguyen, A Kumar, HC Lau - Advances in neural …, 2018 - proceedings.neurips.cc

Scaling decision theoretic planning to large multiagent systems is challenging due to
uncertainty and partial observability in the environment. We focus on a multiagent planning …

被引用次数：122 相关文章所有 10 个版本

[PDF] github.io

[PDF][PDF] Deep multi-agent reinforcement learning for decentralized continuous cooperative control

CS de Witt, B Peng, PA Kamienny, P Torr… - arXiv preprint arXiv …, 2020 - beipeng.github.io

Deep multi-agent reinforcement learning (MARL) holds the promise of automating many real-
world cooperative robotic manipulation and transportation tasks. Nevertheless …

被引用次数：96 相关文章

[PDF] jmlr.org

Importance sampling techniques for policy optimization

AM Metelli, M Papini, N Montali, M Restelli - Journal of Machine Learning …, 2020 - jmlr.org

How can we effectively exploit the collected samples when solving a continuous control task
with Reinforcement Learning? Recent results have empirically demonstrated that multiple …

被引用次数：61 相关文章所有 6 个版本

Proximal policy optimization with policy feedback

Y Gu, Y Cheng, CLP Chen… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org

Proximal policy optimization (PPO) is a deep reinforcement learning algorithm based on the
actor–critic (AC) architecture. In the classic AC architecture, the Critic (value) network is used …

被引用次数：83 相关文章所有 2 个版本

[PDF] neurips.cc

DAC: The double actor-critic architecture for learning options

S Zhang, S Whiteson - Advances in Neural Information …, 2019 - proceedings.neurips.cc

We reformulate the option framework as two parallel augmented MDPs. Under this novel
formulation, all policy optimization algorithms can be used off the shelf to learn intra-option …

被引用次数：93 相关文章所有 13 个版本

高级搜索

QQ 群

A survey on policy search algorithms for learning robot controllers in a handful of trials

Maximum a posteriori policy optimisation

Softmax deep double deterministic policy gradients

Actor-critic policy optimization in partially observable multiagent environments

Deep active inference as variational policy gradients

Credit assignment for collective multiagent RL with global rewards

[PDF][PDF] Deep multi-agent reinforcement learning for decentralized continuous cooperative control

Importance sampling techniques for policy optimization

Proximal policy optimization with policy feedback

DAC: The double actor-critic architecture for learning options

引用