相关文章- 学术资源搜索

High-confidence off-policy evaluation

P Thomas, G Theocharous… - Proceedings of the AAAI …, 2015 - ojs.aaai.org

Many reinforcement learning algorithms use trajectories collected from the execution of one
or more policies to propose a new policy. Because execution of a bad policy can be costly or …

被引用次数：324 相关文章所有 15 个版本

[PDF] mlr.press

Interpretable off-policy evaluation in reinforcement learning by highlighting influential transitions

O Gottesman, J Futoma, Y Liu… - International …, 2020 - proceedings.mlr.press

Off-policy evaluation in reinforcement learning offers the chance of using observational data
to improve future outcomes in domains such as healthcare and education, but safe …

被引用次数：58 相关文章所有 10 个版本

[PDF] aaai.org

A Perspective of Q-value Estimation on Offline-to-Online Reinforcement Learning

Y Zhang, J Liu, C Li, Y Niu, Y Yang, Y Liu… - Proceedings of the …, 2024 - ojs.aaai.org

Offline-to-online Reinforcement Learning (O2O RL) aims to improve the performance of
offline pretrained policy using only a few online samples. Built on offline RL algorithms, most …

被引用次数：3 相关文章所有 3 个版本

[引用][C] Near optimal provable uniform convergence in off-policy evaluation for reinforcement learning

M Yin, Y Bai, YX Wang - arXiv preprint arXiv:2007.03760, 2020

被引用次数：22 相关文章

[PDF] aaai.org

Towards robust off-policy learning for runtime uncertainty

D Xu, Y Ye, C Ruan, B Yang - Proceedings of the AAAI Conference on …, 2022 - ojs.aaai.org

Off-policy learning plays a pivotal role in optimizing and evaluating policies prior to the
online deployment. However, during the real-time serving, we observe varieties of …

被引用次数：6 相关文章所有 3 个版本

[PDF] openreview.net

Efficient Policy Evaluation with Offline Data Informed Behavior Policy Design

S Liu, S Zhang - Forty-first International Conference on Machine … - openreview.net

Most reinforcement learning practitioners evaluate their policies with online Monte Carlo
estimators for either hyperparameter tuning or testing different algorithmic design choices …

[PDF] arxiv.org

被引用次数：5 相关文章所有 2 个版本

高级搜索

QQ 群

High-confidence off-policy evaluation

Interpretable off-policy evaluation in reinforcement learning by highlighting influential transitions

A Perspective of Q-value Estimation on Offline-to-Online Reinforcement Learning

[引用][C] Near optimal provable uniform convergence in off-policy evaluation for reinforcement learning

Towards robust off-policy learning for runtime uncertainty

Efficient Policy Evaluation with Offline Data Informed Behavior Policy Design

Forward and backward state abstractions for off-policy evaluation

Probabilistic Offline Policy Ranking with Approximate Bayesian Computation

Offline-Boosted Actor-Critic: Adaptively Blending Optimal Historical Behaviors in Deep Off-Policy RL

Gradient temporal-difference learning for off-policy evaluation using emphatic weightings

引用