Interpolated policy gradient: Merging on-policy and off-policy gradient estimation for deep reinforcement learning

SS Gu, T Lillicrap, RE Turner… - Advances in neural …, 2017 - proceedings.neurips.cc
… We introduce a parameterized family of policy gradient methods that interpolate between
on-policy and off-policy learning. Such methods are in general biased, but we show that the …

Improving stochastic policy gradients in continuous control with deep reinforcement learning using the beta distribution

PW Chou, D Maturana… - … conference on machine …, 2017 - proceedings.mlr.press
… In the following, we show that the Beta policy is bias-free and a better choice than the … policy
gradient of both policies and show that as with the Gaussian policy, Natural Policy Gradient

[图书][B] Deep Reinforcement Learning Hands-On: Apply modern RL methods, with deep Q-networks, value iteration, policy gradients, TRPO, AlphaGo Zero and more

M Lapan - 2018 - books.google.com
… You will evaluate methods including Cross-entropy and policy gradients, before applying
them to real-… This book is an introduction to deep reinforcement learning (RL) and requires no …

A closer look at deep policy gradients

A Ilyas, L Engstrom, S Santurkar, D Tsipras… - arXiv preprint arXiv …, 2018 - arxiv.org
We study how the behavior of deep policy gradient … : gradient estimation, value prediction,
and optimization landscapes. Our results show that the behavior of deep policy gradient

Deep reinforcement learning with robust deep deterministic policy gradient

T Tiong, I Saad, KTK Teo… - 2020 2nd International …, 2020 - ieeexplore.ieee.org
policy 𝜋∅ is updated by taking the expected return gradient ∇∅ 𝐽(∅). For policy gradient
techniques, the policy is … The update is applied using deterministic policy gradient algorithm: …

[图书][B] … Deep Learning with Keras: Apply deep learning techniques, autoencoders, GANs, variational autoencoders, deep reinforcement learning, policy gradients, and …

R Atienza - 2018 - books.google.com
… , Policy Gradient Methods, explains how to use neural networks to learn the policy for decision
making in reinforcement learning. … policy gradient methods on a continuous action space. …

Soft policy gradient method for maximum entropy deep reinforcement learning

W Shi, S Song, C Wu - arXiv preprint arXiv:1909.03198, 2019 - arxiv.org
… we call Deep Soft Policy Gradient (DSPG). First, we propose soft policy gradient under
maximum … Second, soft policy gradient is combined with soft Bellman equation by employing two …

Evolution-guided policy gradient in reinforcement learning

S Khadka, K Tumer - Advances in Neural Information …, 2018 - proceedings.neurips.cc
Policy gradient methods frame the goal of maximizing return … policy gradient method is
Deep Deterministic Policy Gradient (… art deep reinforcement learning algorithms of the off-policy

Implementation matters in deep policy gradients: A case study on ppo and trpo

L Engstrom, A Ilyas, S Santurkar, D Tsipras… - arXiv preprint arXiv …, 2020 - arxiv.org
… in deep policy gradient algorithms through a case study on two popular algorithms: Proximal
Policy Optimization (PPO) and Trust Region Policy … gains in deep reinforcement learning. …

Robust multi-agent reinforcement learning via minimax deep deterministic policy gradient

S Li, Y Wu, X Cui, H Dong, F Fang, S Russell - Proceedings of the AAAI …, 2019 - aaai.org
Despite the recent advances of deep reinforcement learning (DRL), agents trained by DRL
tend to be brittle and sensitive to the training environment, especially in the multi-agent …