N Kallus, X Mao, K Wang… - … Conference on Machine …, 2022 - proceedings.mlr.press
Off-policy evaluation and learning (OPE/L) use offline observational data to make better decisions, which is crucial in applications where online experimentation is limited. However …
Off-policy Evaluation (OPE), or offline evaluation in general, evaluates the performance of hypothetical policies leveraging only offline log data. It is particularly useful in applications …
Learning an optimal policy from offline data is notoriously challenging, which requires the evaluation of the learning policy using data pre-collected from a static logging policy. We …
In recent years, a new line of research has taken an interventional view of recommender systems, where recommendations are viewed as actions that the system takes to have a …
We propose diagnostics, based on control variates, to detect data quality issues in logged bandit feedback data, which is of critical importance for accurate offline evaluation and …
We study offline policy evaluation in a setting where the target policy can take actions that were not available when the data was logged. We analyze the bias of two popular …
J Lai, L Zou, J Song - arXiv preprint arXiv:2011.14359, 2020 - arxiv.org
Off-policy evaluation is a key component of reinforcement learning which evaluates a target policy with offline data collected from behavior policies. It is a crucial step towards safe …
ABSTRACT Off-policy Evaluation (OPE), or offline evaluation in general, evaluates the performance of hypothetical policies leveraging only offline log data. It is particularly useful …