P Parmas,
T Seno, Y Aoki - International Conference on …, 2023 - proceedings.mlr.press
In model-based reinforcement learning (MBRL), policy gradients can be estimated either by
derivative-free RL methods, such as likelihood ratio gradients (LR), or by backpropagating …