Policy-adaptive estimator selection for off-policy evaluation

Y Saito, Q Ren, T Joachims - international conference on …, 2023 - proceedings.mlr.press

We study off-policy evaluation (OPE) of contextual bandit policies for large discrete action
spaces where conventional importance-weighting approaches suffer from excessive …

被引用次数：18 相关文章所有 8 个版本

[PDF] arxiv.org

Off-policy evaluation of ranking policies under diverse user behavior

H Kiyohara, M Uehara, Y Narita, N Shimizu… - Proceedings of the 29th …, 2023 - dl.acm.org

Ranking interfaces are everywhere in online platforms. There is thus an ever growing
interest in their Off-Policy Evaluation (OPE), aiming towards an accurate performance …

被引用次数：5 相关文章所有 3 个版本

[PDF] arxiv.org

Off-policy evaluation of slate bandit policies via optimizing abstraction

H Kiyohara, M Nomura, Y Saito - Proceedings of the ACM on Web …, 2024 - dl.acm.org

We study off-policy evaluation (OPE) in the problem of slate contextual bandits where a
policy selects multi-dimensional actions known as slates. This problem is widespread in …

被引用次数：4 相关文章所有 4 个版本

[PDF] arxiv.org

POTEC: Off-Policy Learning for Large Action Spaces via Two-Stage Policy Decomposition

Y Saito, J Yao, T Joachims - arXiv preprint arXiv:2402.06151, 2024 - arxiv.org

We study off-policy learning (OPL) of contextual bandit policies in large discrete action
spaces where existing methods--most of which rely crucially on reward-regression models …

被引用次数：4 相关文章所有 2 个版本

[PDF] arxiv.org

On (Normalised) Discounted Cumulative Gain as an Off-Policy Evaluation Metric for Top-n Recommendation

O Jeunen, I Potapov, A Ustimenko - Proceedings of the 30th ACM …, 2024 - dl.acm.org

Approaches to recommendation are typically evaluated in one of two ways:(1) via a
(simulated) online experiment, often seen as the gold standard, or (2) via some offline …

被引用次数：5 相关文章所有 2 个版本

[PDF] arxiv.org

When is off-policy evaluation useful? a data-centric perspective

H Sun, AJ Chan, N Seedat, A Hüyük… - arXiv preprint arXiv …, 2023 - arxiv.org

Evaluating the value of a hypothetical target policy with only a logged dataset is important
but challenging. On the one hand, it brings opportunities for safe policy improvement under …

被引用次数：2 相关文章所有 3 个版本

SCOPE-RL: A Python Library for Offline Reinforcement Learning and Off-Policy Evaluation

H Kiyohara, R Kishimoto, K Kawakami… - arXiv preprint arXiv …, 2023 - arxiv.org

This paper introduces SCOPE-RL, a comprehensive open-source Python software designed
for offline reinforcement learning (offline RL), off-policy evaluation (OPE), and selection …

被引用次数：2 相关文章所有 2 个版本

[PDF] arxiv.org

AutoOPE: Automated Off-Policy Estimator Selection

N Felicioni, M Benigni, MF Dacrema - arXiv preprint arXiv:2406.18022, 2024 - arxiv.org

The Off-Policy Evaluation (OPE) problem consists of evaluating the performance of
counterfactual policies with data collected by another one. This problem is of utmost …

被引用次数：1 相关文章所有 2 个版本

[PDF] arxiv.org

Effective Off-Policy Evaluation and Learning in Contextual Combinatorial Bandits

T Shimizu, K Tanaka, R Kishimoto, H Kiyohara… - arXiv preprint arXiv …, 2024 - arxiv.org

We explore off-policy evaluation and learning (OPE/L) in contextual combinatorial bandits
(CCB), where a policy selects a subset in the action space. For example, it might choose a …

Cross-Validated Off-Policy Evaluation

M Cief, M Kompan, B Kveton - arXiv preprint arXiv:2405.15332, 2024 - arxiv.org

In this paper, we study the problem of estimator selection and hyper-parameter tuning in off-
policy evaluation. Although cross-validation is the most popular method for model selection …

被引用次数：1 相关文章所有 2 个版本

高级搜索

QQ 群