Learning value functions in deep policy gradients using residual variance

Y Flet-Berliac, R Ouhamma, OA Maillard… - arXiv preprint arXiv …, 2020 - arxiv.org
… In addition to being well-motivated by recent studies on the behaviour of deep policy gradient
algorithms, we demonstrate that this modification is both theoretically sound and intuitively …

Interpolated policy gradient: Merging on-policy and off-policy gradient estimation for deep reinforcement learning

SS Gu, T Lillicrap, RE Turner… - Advances in neural …, 2017 - proceedings.neurips.cc
… w(st,at), and therefore, we call this term residual likelihood ratio gradientpolicy gradient
methods, a family of policy gradient algorithms that allow mixing off-policy learning with on-policy

Residual policy learning

T Silver, K Allen, J Tenenbaum, L Kaelbling - arXiv preprint arXiv …, 2018 - arxiv.org
policy gradient methods to learn πθ even if the initial policy π is not differentiable. There
are two ways to see the role of the residual… , we use Deep Deterministic Policy Gradients (DDPG) …

Coordinate-wise control variates for deep policy gradients

Y Zhong, Y Zhou, J Peng - arXiv preprint arXiv:2107.04987, 2021 - arxiv.org
… reduction in deep policy gradient methods. We demonstrate that more variance reduction
can … Learning value functions in deep policy gradients using residual variance. In ICLR 2021-…

Compatible value gradients for reinforcement learning of continuous deep policies

D Balduzzi, M Ghifary - arXiv preprint arXiv:1509.03005, 2015 - arxiv.org
… with prior work on policy gradients – relating to the … : Deep Compatible Function Approximation
Our main result is that the deviator’s value gradient is compatible with the policy gradient

Variance reduction for policy-gradient methods via empirical variance minimization

M Kaledin, A Golubev, D Belomestny - arXiv preprint arXiv:2206.06827, 2022 - arxiv.org
Policy-gradient methods in Reinforcement Learning(RL) are very universal and widely applied
in practice but their performance suffers from the high variance of the gradient … of Deep RL…

Model-based reparameterization policy gradient methods: Theory and practical algorithms

S Zhang, B Liu, Z Wang, T Zhao - Advances in Neural …, 2024 - proceedings.neurips.cc
… model to control the gradient variance, SN has also been applied in deep RL to value
functions in order to enable deeper neural nets [8] or regulate the value-aware model error [72]. …

Learning continuous control policies by stochastic value gradients

N Heess, G Wayne, D Silver… - Advances in neural …, 2015 - proceedings.neurips.cc
policy gradient. Summing these gradients over the trajectory gives the total policy gradient.
… Since SVG(1) is modelbased, we can also use Bellman residual minimization [3]. In practice…

Control regularization for reduced variance reinforcement learning

R Cheng, A Verma, G Orosz… - International …, 2019 - proceedings.mlr.press
… In particular, we regularize the behavior of the deep policy to be similar to a policy prior, ie,
we … This work focuses on policy gradient RL methods, which estimate the gradient of the …

Expected policy gradients for reinforcement learning

K Ciosek, S Whiteson - Journal of Machine Learning Research, 2020 - jmlr.org
… of EPG based on softmax policies. We also establish a new general policy gradient
theorem, of which the stochastic and deterministic policy gradient theorems are special cases. …