Revar: Strengthening policy evaluation via reduced variance sampling

S Mukherjee, JP Hanna… - Uncertainty in Artificial …, 2022 - proceedings.mlr.press
This paper studies the problem of data collection for policy evaluation in Markov decision
processes (MDPs). In policy evaluation, we are given a\textit {target} policy and asked to …

Robust on-policy sampling for data-efficient policy evaluation in reinforcement learning

R Zhong, D Zhang, L Schäfer… - Advances in Neural …, 2022 - proceedings.neurips.cc
Reinforcement learning (RL) algorithms are often categorized as either on-policy or off-
policy depending on whether they use data from a target policy of interest or from a different …