L Pan,
Q Cai,
L Huang - Advances in neural information …, 2020 - proceedings.neurips.cc
A widely-used actor-critic reinforcement learning algorithm for continuous control, Deep
Deterministic Policy Gradients (DDPG), suffers from the overestimation problem, which can …