A Faust, A Francis, D Mehta - arXiv preprint arXiv:1905.07628, 2019 - academia.edu
Many continuous control tasks have easily formulated objectives, yet using them directly as
a reward in reinforcement learning (RL) leads to suboptimal policies. Therefore, many …