Empirical study of off-policy policy evaluation for reinforcement learning

J Zhu, R Wan, Z Qi, S Luo, C Shi - arXiv preprint arXiv:2310.18715, 2023 - arxiv.org

This paper endeavors to augment the robustness of offline reinforcement learning (RL) in
scenarios laden with heavy-tailed rewards, a prevalent circumstance in real-world …

被引用次数：2 相关文章所有 3 个版本

Balancing therapeutic effect and safety in ventilator parameter recommendation: An offline reinforcement learning approach

B Zhang, X Qiu, X Tan - Engineering Applications of Artificial Intelligence, 2024 - Elsevier

Reinforcement learning (RL) is increasingly applied in recommending ventilator parameters,
yet existing methods prioritize therapeutic effect over patient safety. This leads to excessive …

被引用次数：1 相关文章

[PDF] arxiv.org

Safe autonomous racing via approximate reachability on ego-vision

B Chen, J Francis, J Oh, E Nyberg… - arXiv preprint arXiv …, 2021 - arxiv.org

Racing demands each vehicle to drive at its physical limits, when any safety infraction could
lead to catastrophic failure. In this work, we study the problem of safe reinforcement learning …

被引用次数：14 相关文章所有 4 个版本

[PDF] neurips.cc

On blame attribution for accountable multi-agent sequential decision making

S Triantafyllou, A Singla… - Advances in Neural …, 2021 - proceedings.neurips.cc

Blame attribution is one of the key aspects of accountable decision making, as it provides
means to quantify the responsibility of an agent for a decision making outcome. In this paper …

被引用次数：10 相关文章所有 10 个版本

[PDF] arxiv.org

Sample complexity of offline reinforcement learning with deep ReLU networks

T Nguyen-Tang, S Gupta, H Tran-The… - arXiv preprint arXiv …, 2021 - arxiv.org

Offline reinforcement learning (RL) leverages previously collected data for policy
optimization without any further active exploration. Despite the recent interest in this …

被引用次数：16 相关文章所有 4 个版本

[PDF] jmlr.org

On instrumental variable regression for deep offline policy evaluation

Y Chen, L Xu, C Gulcehre, T Le Paine, A Gretton… - Journal of Machine …, 2022 - jmlr.org

We show that the popular reinforcement learning (RL) strategy of estimating the stateaction
value (Q-function) by minimizing the mean squared Bellman error leads to a regression …

被引用次数：14 相关文章所有 5 个版本

[PDF] neurips.cc

Counterfactual-augmented importance sampling for semi-offline policy evaluation

S Tang, J Wiens - Advances in Neural Information …, 2023 - proceedings.neurips.cc

In applying reinforcement learning (RL) to high-stakes domains, quantitative and qualitative
evaluation using observational data can help practitioners understand the generalization …

Draftrec: personalized draft recommendation for winning in multi-player online battle arena games

H Lee, D Hwang, H Kim, B Lee, J Choo - Proceedings of the ACM Web …, 2022 - dl.acm.org

This paper presents a personalized character recommendation system for Multiplayer
Online Battle Arena (MOBA) games which are considered as one of the most popular online …

被引用次数：11 相关文章所有 5 个版本

[PDF] neurips.cc

Local metric learning for off-policy evaluation in contextual bandits with continuous actions

H Lee, J Lee, Y Choi, W Jeon, BJ Lee… - Advances in …, 2022 - proceedings.neurips.cc

We consider local kernel metric learning for off-policy evaluation (OPE) of deterministic
policies in contextual bandits with continuous action spaces. Our work is motivated by …

被引用次数：3 相关文章所有 8 个版本

[PDF] arxiv.org

Hybrid value estimation for off-policy evaluation and offline reinforcement learning

XK Jin, XH Liu, S Jiang, Y Yu - arXiv preprint arXiv:2206.02000, 2022 - arxiv.org

Value function estimation is an indispensable subroutine in reinforcement learning, which
becomes more challenging in the offline setting. In this paper, we propose Hybrid Value …

被引用次数：7 相关文章所有 2 个版本

高级搜索

QQ 群