Counterfactual learning and evaluation for recommender systems: Foundations, implementations, and recent advances

Y Saito, T Joachims - Proceedings of the 15th ACM Conference on …, 2021 - dl.acm.org
Counterfactual estimators enable the use of existing log data to estimate how some new
target recommendation policy would have performed, if it had been used instead of the …

Doubly robust distributionally robust off-policy evaluation and learning

N Kallus, X Mao, K Wang… - … Conference on Machine …, 2022 - proceedings.mlr.press
Off-policy evaluation and learning (OPE/L) use offline observational data to make better
decisions, which is crucial in applications where online experimentation is limited. However …

Evaluating the robustness of off-policy evaluation

Y Saito, T Udagawa, H Kiyohara, K Mogi… - Proceedings of the 15th …, 2021 - dl.acm.org
Off-policy Evaluation (OPE), or offline evaluation in general, evaluates the performance of
hypothetical policies leveraging only offline log data. It is particularly useful in applications …

Distributionally robust policy gradient for offline contextual bandits

Z Yang, Y Guo, P Xu, A Liu… - International …, 2023 - proceedings.mlr.press
Learning an optimal policy from offline data is notoriously challenging, which requires the
evaluation of the learning policy using data pre-collected from a static logging policy. We …

Recommendations as treatments

T Joachims, B London, Y Su, A Swaminathan, L Wang - AI Magazine, 2021 - ojs.aaai.org
In recent years, a new line of research has taken an interventional view of recommender
systems, where recommendations are viewed as actions that the system takes to have a …

[HTML][HTML] Control variate diagnostics for detecting problems in logged bandit feedback

B London, T Joachims - 2022 - amazon.science
We propose diagnostics, based on control variates, to detect data quality issues in logged
bandit feedback data, which is of critical importance for accurate offline evaluation and …

[HTML][HTML] Offline policy evaluation with new arms

B London, T Joachims - 2020 - amazon.science
We study offline policy evaluation in a setting where the target policy can take actions that
were not available when the data was logged. We analyze the bias of two popular …

Optimal Mixture Weights for Off-Policy Evaluation with Multiple Behavior Policies

J Lai, L Zou, J Song - arXiv preprint arXiv:2011.14359, 2020 - arxiv.org
Off-policy evaluation is a key component of reinforcement learning which evaluates a target
policy with offline data collected from behavior policies. It is a crucial step towards safe …

[PDF][PDF] Evaluating Off-Policy Evaluation: Sensitivity and Robustness

Y Saito, T Udagawa, H Kiyohara, K Mogi, Y Narita… - 2021 - bcirwis2021.github.io
ABSTRACT Off-policy Evaluation (OPE), or offline evaluation in general, evaluates the
performance of hypothetical policies leveraging only offline log data. It is particularly useful …