相关文章- 学术资源搜索

An instrumental variable approach to confounded off-policy evaluation

Y Xu, J Zhu, C Shi, S Luo… - … Conference on Machine …, 2023 - proceedings.mlr.press

Off-policy evaluation (OPE) aims to estimate the return of a target policy using some pre-
collected observational data generated by a potentially different behavior policy. In many …

被引用次数：12 相关文章所有 7 个版本

[PDF] nsf.gov

Efficiently breaking the curse of horizon in off-policy evaluation with double reinforcement learning

N Kallus, M Uehara - Operations Research, 2022 - pubsonline.informs.org

Off-policy evaluation (OPE) in reinforcement learning is notoriously difficult in long-and
infinite-horizon settings due to diminishing overlap between behavior and target policies. In …

被引用次数：48 相关文章所有 6 个版本

[PDF] aaai.org

Policy-adaptive estimator selection for off-policy evaluation

T Udagawa, H Kiyohara, Y Narita, Y Saito… - Proceedings of the AAAI …, 2023 - ojs.aaai.org

Off-policy evaluation (OPE) aims to accurately evaluate the performance of counterfactual
policies using only offline logged data. Although many estimators have been developed …

被引用次数：13 相关文章所有 5 个版本

[PDF] arxiv.org

Efficiently breaking the curse of horizon in off-policy evaluation with double reinforcement learning

N Kallus, M Uehara - arXiv preprint arXiv:1909.05850, 2019 - arxiv.org

Off-policy evaluation (OPE) in reinforcement learning is notoriously difficult in long-and
infinite-horizon settings due to diminishing overlap between behavior and target policies. In …

被引用次数：23 相关文章所有 2 个版本

[PDF] neurips.cc

Off-policy evaluation with deficient support using side information

N Felicioni, M Ferrari Dacrema… - Advances in …, 2022 - proceedings.neurips.cc

Abstract The Off-Policy Evaluation (OPE) problem consists in evaluating the performance of
new policies from the data collected by another one. OPE is crucial when evaluating a new …

被引用次数：8 相关文章所有 9 个版本

[PDF] arxiv.org

Quantile off-policy evaluation via deep conditional generative learning

Y Xu, C Shi, S Luo, L Wang, R Song - arXiv preprint arXiv:2212.14466, 2022 - arxiv.org

Off-Policy evaluation (OPE) is concerned with evaluating a new target policy using offline
data generated by a potentially different behavior policy. It is critical in a number of …

被引用次数：5 相关文章所有 2 个版本

[PDF] mlr.press

Deeply-debiased off-policy interval estimation

C Shi, R Wan, V Chernozhukov… - … conference on machine …, 2021 - proceedings.mlr.press

Off-policy evaluation learns a target policy's value with a historical dataset generated by a
different behavior policy. In addition to a point estimate, many applications would benefit …

被引用次数：38 相关文章所有 5 个版本

[PDF] neurips.cc

State-action similarity-based representations for off-policy evaluation

B Pavse, J Hanna - Advances in Neural Information …, 2024 - proceedings.neurips.cc

In reinforcement learning, off-policy evaluation (OPE) is the problem of estimating the
expected return of an evaluation policy given a fixed dataset that was collected by running …

被引用次数：2 相关文章所有 5 个版本

[PDF] mlr.press

Understanding the curse of horizon in off-policy evaluation via conditional importance sampling

Y Liu, PL Bacon, E Brunskill - International Conference on …, 2020 - proceedings.mlr.press

Off-policy policy estimators that use importance sampling (IS) can suffer from high variance
in long-horizon domains, and there has been particular excitement over new IS methods that …

被引用次数：41 相关文章所有 7 个版本

[PDF] mlr.press

Off-policy evaluation for large action spaces via conjunct effect modeling

Y Saito, Q Ren, T Joachims - international conference on …, 2023 - proceedings.mlr.press

We study off-policy evaluation (OPE) of contextual bandit policies for large discrete action
spaces where conventional importance-weighting approaches suffer from excessive …

被引用次数：15 相关文章所有 8 个版本

高级搜索

QQ 群

An instrumental variable approach to confounded off-policy evaluation

Efficiently breaking the curse of horizon in off-policy evaluation with double reinforcement learning

Policy-adaptive estimator selection for off-policy evaluation

Efficiently breaking the curse of horizon in off-policy evaluation with double reinforcement learning

Off-policy evaluation with deficient support using side information

Quantile off-policy evaluation via deep conditional generative learning

Deeply-debiased off-policy interval estimation

State-action similarity-based representations for off-policy evaluation

Understanding the curse of horizon in off-policy evaluation via conditional importance sampling

Off-policy evaluation for large action spaces via conjunct effect modeling

相关搜索

引用