Marginalized importance sampling for off-environment policy evaluation

P Katdare, N Jiang… - Conference on Robot …, 2023 - proceedings.mlr.press
Reinforcement Learning (RL) methods are typically sample-inefficient, making it challenging
to train and deploy RL-policies in real world robots. Even a robust policy trained in …

Towards Provable Log Density Policy Gradient

P Katdare, A Joshi, K Driggs-Campbell - arXiv preprint arXiv:2403.01605, 2024 - arxiv.org
Policy gradient methods are a vital ingredient behind the success of modern reinforcement
learning. Modern policy gradient methods, although successful, introduce a residual error in …