SaVeR: Optimal Data Collection Strategy for Safe Policy Evaluation in Tabular MDP

S Mukherjee, JP Hanna, R Nowak - arXiv preprint arXiv:2406.02165, 2024 - arxiv.org
In this paper, we study safe data collection for the purpose of policy evaluation in tabular
Markov decision processes (MDPs). In policy evaluation, we are given a\textit {target} policy …