相关文章- 学术资源搜索

[PDF][PDF] Empirical analysis of off-policy policy evaluation for reinforcement learning

C Voloshin, HM Le, Y Yue - Real-world Sequential Decision …, 2019 - realworld-sdm.github.io

Off-policy policy evaluation (OPE) is the task of predicting the online performance of a policy
using only pre-collected historical data (collected from an existing deployed policy or set of …

被引用次数：5 相关文章

[PDF] arxiv.org

Empirical study of off-policy policy evaluation for reinforcement learning

C Voloshin, HM Le, N Jiang, Y Yue - arXiv preprint arXiv:1911.06854, 2019 - arxiv.org

We offer an experimental benchmark and empirical study for off-policy policy evaluation
(OPE) in reinforcement learning, which is a key problem in many safety critical applications …

被引用次数：138 相关文章所有 11 个版本

SCOPE-RL: A Python Library for Offline Reinforcement Learning and Off-Policy Evaluation

H Kiyohara, R Kishimoto, K Kawakami… - arXiv preprint arXiv …, 2023 - arxiv.org

This paper introduces SCOPE-RL, a comprehensive open-source Python software designed
for offline reinforcement learning (offline RL), off-policy evaluation (OPE), and selection …

被引用次数：2 相关文章所有 2 个版本

[PDF] neurips.cc

Counterfactual-augmented importance sampling for semi-offline policy evaluation

S Tang, J Wiens - Advances in Neural Information …, 2023 - proceedings.neurips.cc

In applying reinforcement learning (RL) to high-stakes domains, quantitative and qualitative
evaluation using observational data can help practitioners understand the generalization …

An instrumental variable approach to confounded off-policy evaluation

Y Xu, J Zhu, C Shi, S Luo… - … Conference on Machine …, 2023 - proceedings.mlr.press

Off-policy evaluation (OPE) aims to estimate the return of a target policy using some pre-
collected observational data generated by a potentially different behavior policy. In many …

被引用次数：11 相关文章所有 7 个版本

[PDF] mlr.press

Consistent on-line off-policy evaluation

A Hallak, S Mannor - International Conference on Machine …, 2017 - proceedings.mlr.press

The problem of on-line off-policy evaluation (OPE) has been actively studied in the last
decade due to its importance both as a stand-alone problem and as a module in a policy …

被引用次数：106 相关文章所有 6 个版本

[PDF] aaai.org

Policy-adaptive estimator selection for off-policy evaluation

T Udagawa, H Kiyohara, Y Narita, Y Saito… - Proceedings of the AAAI …, 2023 - ojs.aaai.org

Off-policy evaluation (OPE) aims to accurately evaluate the performance of counterfactual
policies using only offline logged data. Although many estimators have been developed …

被引用次数：12 相关文章所有 5 个版本

[PDF] arxiv.org

Benchmarks for deep off-policy evaluation

J Fu, M Norouzi, O Nachum, G Tucker, Z Wang… - arXiv preprint arXiv …, 2021 - arxiv.org

Off-policy evaluation (OPE) holds the promise of being able to leverage large, offline
datasets for both evaluating and selecting complex policies for decision making. The ability …

被引用次数：79 相关文章所有 5 个版本

[PDF] arxiv.org

A review of off-policy evaluation in reinforcement learning

M Uehara, C Shi, N Kallus - arXiv preprint arXiv:2212.06355, 2022 - arxiv.org

Reinforcement learning (RL) is one of the most vibrant research frontiers in machine
learning and has been recently applied to solve a number of challenging problems. In this …

被引用次数：45 相关文章所有 2 个版本

[PDF] arxiv.org

Reliable off-policy evaluation for reinforcement learning

J Wang, R Gao, H Zha - Operations Research, 2024 - pubsonline.informs.org

In a sequential decision-making problem, off-policy evaluation estimates the expected
cumulative reward of a target policy using logged trajectory data generated from a different …

被引用次数：15 相关文章所有 10 个版本

高级搜索

QQ 群

[PDF][PDF] Empirical analysis of off-policy policy evaluation for reinforcement learning

Empirical study of off-policy policy evaluation for reinforcement learning

SCOPE-RL: A Python Library for Offline Reinforcement Learning and Off-Policy Evaluation

Counterfactual-augmented importance sampling for semi-offline policy evaluation

An instrumental variable approach to confounded off-policy evaluation

Consistent on-line off-policy evaluation

Policy-adaptive estimator selection for off-policy evaluation

Benchmarks for deep off-policy evaluation

A review of off-policy evaluation in reinforcement learning

Reliable off-policy evaluation for reinforcement learning

相关搜索

引用