相关文章- 学术资源搜索

Doubly robust distributionally robust off-policy evaluation and learning

N Kallus, X Mao, K Wang… - … Conference on Machine …, 2022 - proceedings.mlr.press

Off-policy evaluation and learning (OPE/L) use offline observational data to make better
decisions, which is crucial in applications where online experimentation is limited. However …

被引用次数：31 相关文章所有 5 个版本

[PDF] jmlr.org

Double reinforcement learning for efficient off-policy evaluation in markov decision processes

N Kallus, M Uehara - Journal of Machine Learning Research, 2020 - jmlr.org

Off-policy evaluation (OPE) in reinforcement learning allows one to evaluate novel decision
policies without needing to conduct exploration, which is often costly or otherwise infeasible …

被引用次数：188 相关文章所有 7 个版本

[PDF] mlr.press

Off-policy evaluation in infinite-horizon reinforcement learning with latent confounders

A Bennett, N Kallus, L Li… - … Conference on Artificial …, 2021 - proceedings.mlr.press

Off-policy evaluation (OPE) in reinforcement learning is an important problem in settings
where experimentation is limited, such as healthcare. But, in these very same settings …

被引用次数：48 相关文章所有 6 个版本

[PDF] nsf.gov

Efficiently breaking the curse of horizon in off-policy evaluation with double reinforcement learning

N Kallus, M Uehara - Operations Research, 2022 - pubsonline.informs.org

Off-policy evaluation (OPE) in reinforcement learning is notoriously difficult in long-and
infinite-horizon settings due to diminishing overlap between behavior and target policies. In …

被引用次数：48 相关文章所有 6 个版本

[引用][C] Near optimal provable uniform convergence in off-policy evaluation for reinforcement learning

M Yin, Y Bai, YX Wang - arXiv preprint arXiv:2007.03760, 2020

被引用次数：22 相关文章

[PDF] neurips.cc

Intrinsically efficient, stable, and bounded off-policy evaluation for reinforcement learning

N Kallus, M Uehara - Advances in neural information …, 2019 - proceedings.neurips.cc

Off-policy evaluation (OPE) in both contextual bandits and reinforcement learning allows
one to evaluate novel decision policies without needing to conduct exploration, which is …

被引用次数：54 相关文章所有 8 个版本

[引用][C] Efficiently breaking the curse of horizon: Double reinforcement learning in infinite-horizon processes

N Kallus, M Uehara - arXiv preprint arXiv:1909.05850, 2019

被引用次数：51 相关文章

[PDF] arxiv.org

Off-policy evaluation via adaptive weighting with data from contextual bandits

R Zhan, V Hadad, DA Hirshberg, S Athey - Proceedings of the 27th ACM …, 2021 - dl.acm.org

It has become increasingly common for data to be collected adaptively, for example using
contextual bandits. Historical data of this type can be used to evaluate other treatment …

被引用次数：50 相关文章所有 5 个版本

[PDF] neurips.cc

State-action similarity-based representations for off-policy evaluation

B Pavse, J Hanna - Advances in Neural Information …, 2024 - proceedings.neurips.cc

In reinforcement learning, off-policy evaluation (OPE) is the problem of estimating the
expected return of an evaluation policy given a fixed dataset that was collected by running …

被引用次数：2 相关文章所有 5 个版本

[PDF] arxiv.org

Open bandit dataset and pipeline: Towards realistic and reproducible off-policy evaluation

Y Saito, S Aihara, M Matsutani, Y Narita - arXiv preprint arXiv:2008.07146, 2020 - arxiv.org

Off-policy evaluation (OPE) aims to estimate the performance of hypothetical policies using
data generated by a different policy. Because of its huge potential impact in practice, there …

被引用次数：72 相关文章所有 6 个版本

高级搜索

QQ 群