Sample-Efficient Iterative Lower Bound Optimization of Deep Reactive Policies for Planning in Continuous MDPs

SM Low, A Kumar, S Sanner - Proceedings of the AAAI Conference on …, 2022 - ojs.aaai.org
Recent advances in deep learning have enabled optimization of deep reactive policies
(DRPs) for continuous MDP planning by encoding a parametric policy as a deep neural …

Planning in stochastic computation graphs: solving stochastic nonlinear problems with backpropagation

TP Bueno - 2021 - teses.usp.br
Deep Learning has achieved remarkable success in a range of complex perception tasks,
games, and other real-world applications. At a high level, it can be argued that the main …