residual variance deep policy gradients- 学术资源搜索

Learning value functions in deep policy gradients using residual variance

Y Flet-Berliac, R Ouhamma, OA Maillard… - arXiv preprint arXiv …, 2020 - arxiv.org

… In addition to being well-motivated by recent studies on the behaviour of deep policy gradient
algorithms, we demonstrate that this modification is both theoretically sound and intuitively …

被引用次数：20 相关文章所有 12 个版本

[PDF] neurips.cc

Interpolated policy gradient: Merging on-policy and off-policy gradient estimation for deep reinforcement learning

SS Gu, T Lillicrap, RE Turner… - Advances in neural …, 2017 - proceedings.neurips.cc

… w(st,at), and therefore, we call this term residual likelihood ratio gradient… policy gradient
methods, a family of policy gradient algorithms that allow mixing off-policy learning with on-policy …

被引用次数：191 相关文章所有 16 个版本

[PDF] arxiv.org

Residual policy learning

T Silver, K Allen, J Tenenbaum, L Kaelbling - arXiv preprint arXiv …, 2018 - arxiv.org

… policy gradient methods to learn πθ even if the initial policy π is not differentiable. There
are two ways to see the role of the residual… , we use Deep Deterministic Policy Gradients (DDPG) …

被引用次数：176 相关文章所有 2 个版本

[PDF] arxiv.org

Coordinate-wise control variates for deep policy gradients

Y Zhong, Y Zhou, J Peng - arXiv preprint arXiv:2107.04987, 2021 - arxiv.org

… reduction in deep policy gradient methods. We demonstrate that more variance reduction
can … Learning value functions in deep policy gradients using residual variance. In ICLR 2021-…

被引用次数：1 相关文章所有 2 个版本

[PDF] arxiv.org

Compatible value gradients for reinforcement learning of continuous deep policies

D Balduzzi, M Ghifary - arXiv preprint arXiv:1509.03005, 2015 - arxiv.org

… with prior work on policy gradients – relating to the … : Deep Compatible Function Approximation
Our main result is that the deviator’s value gradient is compatible with the policy gradient …

被引用次数：36 相关文章所有 4 个版本

[PDF] arxiv.org

Variance reduction for policy-gradient methods via empirical variance minimization

M Kaledin, A Golubev, D Belomestny - arXiv preprint arXiv:2206.06827, 2022 - arxiv.org

Policy-gradient methods in Reinforcement Learning(RL) are very universal and widely applied
in practice but their performance suffers from the high variance of the gradient … of Deep RL…

被引用次数：2 相关文章所有 2 个版本

[PDF] neurips.cc

Model-based reparameterization policy gradient methods: Theory and practical algorithms

S Zhang, B Liu, Z Wang, T Zhao - Advances in Neural …, 2024 - proceedings.neurips.cc

… model to control the gradient variance, SN has also been applied in deep RL to value
functions in order to enable deeper neural nets [8] or regulate the value-aware model error [72]. …

被引用次数：2 相关文章所有 5 个版本

[PDF] neurips.cc

Learning continuous control policies by stochastic value gradients

N Heess, G Wayne, D Silver… - Advances in neural …, 2015 - proceedings.neurips.cc

… policy gradient. Summing these gradients over the trajectory gives the total policy gradient.
… Since SVG(1) is modelbased, we can also use Bellman residual minimization [3]. In practice…

被引用次数：658 相关文章所有 11 个版本

[PDF] mlr.press

Control regularization for reduced variance reinforcement learning

R Cheng, A Verma, G Orosz… - International …, 2019 - proceedings.mlr.press

… In particular, we regularize the behavior of the deep policy to be similar to a policy prior, ie,
we … This work focuses on policy gradient RL methods, which estimate the gradient of the …

被引用次数：89 相关文章所有 19 个版本

[PDF] jmlr.org

Expected policy gradients for reinforcement learning

K Ciosek, S Whiteson - Journal of Machine Learning Research, 2020 - jmlr.org

… of EPG based on softmax policies. We also establish a new general policy gradient
theorem, of which the stochastic and deterministic policy gradient theorems are special cases. …

被引用次数：48 相关文章所有 7 个版本

高级搜索

QQ 群

Learning value functions in deep policy gradients using residual variance

Interpolated policy gradient: Merging on-policy and off-policy gradient estimation for deep reinforcement learning

Residual policy learning

Coordinate-wise control variates for deep policy gradients

Compatible value gradients for reinforcement learning of continuous deep policies

Variance reduction for policy-gradient methods via empirical variance minimization

Model-based reparameterization policy gradient methods: Theory and practical algorithms

Learning continuous control policies by stochastic value gradients

Control regularization for reduced variance reinforcement learning

Expected policy gradients for reinforcement learning

相关搜索

引用