policy gradients deep reinforcement- 学术资源搜索

Interpolated policy gradient: Merging on-policy and off-policy gradient estimation for deep reinforcement learning

SS Gu, T Lillicrap, RE Turner… - Advances in neural …, 2017 - proceedings.neurips.cc

… We introduce a parameterized family of policy gradient methods that interpolate between
on-policy and off-policy learning. Such methods are in general biased, but we show that the …

被引用次数：191 相关文章所有 16 个版本

[PDF] mlr.press

Improving stochastic policy gradients in continuous control with deep reinforcement learning using the beta distribution

PW Chou, D Maturana… - … conference on machine …, 2017 - proceedings.mlr.press

… In the following, we show that the Beta policy is bias-free and a better choice than the … policy
gradient of both policies and show that as with the Gaussian policy, Natural Policy Gradient …

被引用次数：195 相关文章所有 4 个版本

[图书][B] Deep Reinforcement Learning Hands-On: Apply modern RL methods, with deep Q-networks, value iteration, policy gradients, TRPO, AlphaGo Zero and more

M Lapan - 2018 - books.google.com

… You will evaluate methods including Cross-entropy and policy gradients, before applying
them to real-… This book is an introduction to deep reinforcement learning (RL) and requires no …

被引用次数：269 相关文章所有 5 个版本

[PDF] arxiv.org

A closer look at deep policy gradients

A Ilyas, L Engstrom, S Santurkar, D Tsipras… - arXiv preprint arXiv …, 2018 - arxiv.org

We study how the behavior of deep policy gradient … : gradient estimation, value prediction,
and optimization landscapes. Our results show that the behavior of deep policy gradient …

被引用次数：74 相关文章所有 6 个版本

Deep reinforcement learning with robust deep deterministic policy gradient

T Tiong, I Saad, KTK Teo… - 2020 2nd International …, 2020 - ieeexplore.ieee.org

… policy 𝜋∅ is updated by taking the expected return gradient ∇∅ 𝐽(∅). For policy gradient
techniques, the policy is … The update is applied using deterministic policy gradient algorithm: …

被引用次数：26 相关文章

[图书][B] … Deep Learning with Keras: Apply deep learning techniques, autoencoders, GANs, variational autoencoders, deep reinforcement learning, policy gradients, and …

R Atienza - 2018 - books.google.com

… , Policy Gradient Methods, explains how to use neural networks to learn the policy for decision
making in reinforcement learning. … policy gradient methods on a continuous action space. …

被引用次数：91 相关文章所有 2 个版本

[PDF] arxiv.org

Soft policy gradient method for maximum entropy deep reinforcement learning

W Shi, S Song, C Wu - arXiv preprint arXiv:1909.03198, 2019 - arxiv.org

… we call Deep Soft Policy Gradient (DSPG). First, we propose soft policy gradient under
maximum … Second, soft policy gradient is combined with soft Bellman equation by employing two …

被引用次数：39 相关文章所有 7 个版本

[PDF] neurips.cc

Evolution-guided policy gradient in reinforcement learning

S Khadka, K Tumer - Advances in Neural Information …, 2018 - proceedings.neurips.cc

… Policy gradient methods frame the goal of maximizing return … policy gradient method is
Deep Deterministic Policy Gradient (… art deep reinforcement learning algorithms of the off-policy …

被引用次数：254 相关文章所有 7 个版本

[PDF] arxiv.org

Implementation matters in deep policy gradients: A case study on ppo and trpo

L Engstrom, A Ilyas, S Santurkar, D Tsipras… - arXiv preprint arXiv …, 2020 - arxiv.org

… in deep policy gradient algorithms through a case study on two popular algorithms: Proximal
Policy Optimization (PPO) and Trust Region Policy … gains in deep reinforcement learning. …

被引用次数：210 相关文章所有 3 个版本

[PDF] aaai.org

Robust multi-agent reinforcement learning via minimax deep deterministic policy gradient

S Li, Y Wu, X Cui, H Dong, F Fang, S Russell - Proceedings of the AAAI …, 2019 - aaai.org

Despite the recent advances of deep reinforcement learning (DRL), agents trained by DRL
tend to be brittle and sensitive to the training environment, especially in the multi-agent …

被引用次数：331 相关文章所有 16 个版本

高级搜索

QQ 群

Interpolated policy gradient: Merging on-policy and off-policy gradient estimation for deep reinforcement learning

Improving stochastic policy gradients in continuous control with deep reinforcement learning using the beta distribution

[图书][B] Deep Reinforcement Learning Hands-On: Apply modern RL methods, with deep Q-networks, value iteration, policy gradients, TRPO, AlphaGo Zero and more

A closer look at deep policy gradients

Deep reinforcement learning with robust deep deterministic policy gradient

[图书][B] … Deep Learning with Keras: Apply deep learning techniques, autoencoders, GANs, variational autoencoders, deep reinforcement learning, policy gradients, and …

Soft policy gradient method for maximum entropy deep reinforcement learning

Evolution-guided policy gradient in reinforcement learning

Implementation matters in deep policy gradients: A case study on ppo and trpo

Robust multi-agent reinforcement learning via minimax deep deterministic policy gradient

相关搜索

引用