A unified view of likelihood ratio and reparameterization gradients and an optimal importance sampling scheme

P Parmas, M Sugiyama - arXiv preprint arXiv:1910.06419, 2019 - arxiv.org
Reparameterization (RP) and likelihood ratio (LR) gradient estimators are used throughout
machine and reinforcement learning; however, they are usually explained as simple …