Triply robust off-policy evaluation

Counterfactual learning and evaluation for recommender systems: Foundations, implementations, and recent advances

Y Saito, T Joachims - Proceedings of the 15th ACM Conference on …, 2021 - dl.acm.org

Counterfactual estimators enable the use of existing log data to estimate how some new
target recommendation policy would have performed, if it had been used instead of the …

被引用次数：55 相关文章所有 3 个版本

[PDF] mlr.press

Doubly robust distributionally robust off-policy evaluation and learning

N Kallus, X Mao, K Wang… - … Conference on Machine …, 2022 - proceedings.mlr.press

Off-policy evaluation and learning (OPE/L) use offline observational data to make better
decisions, which is crucial in applications where online experimentation is limited. However …

被引用次数：41 相关文章所有 5 个版本

[PDF] arxiv.org

Evaluating the robustness of off-policy evaluation

Y Saito, T Udagawa, H Kiyohara, K Mogi… - Proceedings of the 15th …, 2021 - dl.acm.org

Off-policy Evaluation (OPE), or offline evaluation in general, evaluates the performance of
hypothetical policies leveraging only offline log data. It is particularly useful in applications …

被引用次数：35 相关文章所有 10 个版本

[PDF] mlr.press

Distributionally robust policy gradient for offline contextual bandits

Z Yang, Y Guo, P Xu, A Liu… - International …, 2023 - proceedings.mlr.press

Learning an optimal policy from offline data is notoriously challenging, which requires the
evaluation of the learning policy using data pre-collected from a static logging policy. We …

被引用次数：12 相关文章所有 2 个版本

[PDF] aaai.org Full View

Recommendations as treatments

T Joachims, B London, Y Su, A Swaminathan, L Wang - AI Magazine, 2021 - ojs.aaai.org

In recent years, a new line of research has taken an interventional view of recommender
systems, where recommendations are viewed as actions that the system takes to have a …

被引用次数：20 相关文章所有 8 个版本

[HTML] amazon.science

[HTML][HTML] Control variate diagnostics for detecting problems in logged bandit feedback

B London, T Joachims - 2022 - amazon.science

We propose diagnostics, based on control variates, to detect data quality issues in logged
bandit feedback data, which is of critical importance for accurate offline evaluation and …

被引用次数：7 相关文章所有 2 个版本

[HTML] amazon.science

[HTML][HTML] Offline policy evaluation with new arms

B London, T Joachims - 2020 - amazon.science

We study offline policy evaluation in a setting where the target policy can take actions that
were not available when the data was logged. We analyze the bias of two popular …

被引用次数：6 相关文章所有 3 个版本

[PDF] arxiv.org

Optimal Mixture Weights for Off-Policy Evaluation with Multiple Behavior Policies

J Lai, L Zou, J Song - arXiv preprint arXiv:2011.14359, 2020 - arxiv.org

Off-policy evaluation is a key component of reinforcement learning which evaluates a target
policy with offline data collected from behavior policies. It is a crucial step towards safe …

[PDF][PDF] Evaluating Off-Policy Evaluation: Sensitivity and Robustness

Y Saito, T Udagawa, H Kiyohara, K Mogi, Y Narita… - 2021 - bcirwis2021.github.io

ABSTRACT Off-policy Evaluation (OPE), or offline evaluation in general, evaluates the
performance of hypothetical policies leveraging only offline log data. It is particularly useful …

高级搜索

QQ 群