Robust offline policy evaluation and optimization with heavy-tailed rewards

J Zhu, R Wan, Z Qi, S Luo, C Shi - arXiv preprint arXiv:2310.18715, 2023 - arxiv.org
This paper endeavors to augment the robustness of offline reinforcement learning (RL) in
scenarios laden with heavy-tailed rewards, a prevalent circumstance in real-world …

Balancing therapeutic effect and safety in ventilator parameter recommendation: An offline reinforcement learning approach

B Zhang, X Qiu, X Tan - Engineering Applications of Artificial Intelligence, 2024 - Elsevier
Reinforcement learning (RL) is increasingly applied in recommending ventilator parameters,
yet existing methods prioritize therapeutic effect over patient safety. This leads to excessive …

Safe autonomous racing via approximate reachability on ego-vision

B Chen, J Francis, J Oh, E Nyberg… - arXiv preprint arXiv …, 2021 - arxiv.org
Racing demands each vehicle to drive at its physical limits, when any safety infraction could
lead to catastrophic failure. In this work, we study the problem of safe reinforcement learning …

On blame attribution for accountable multi-agent sequential decision making

S Triantafyllou, A Singla… - Advances in Neural …, 2021 - proceedings.neurips.cc
Blame attribution is one of the key aspects of accountable decision making, as it provides
means to quantify the responsibility of an agent for a decision making outcome. In this paper …

Sample complexity of offline reinforcement learning with deep ReLU networks

T Nguyen-Tang, S Gupta, H Tran-The… - arXiv preprint arXiv …, 2021 - arxiv.org
Offline reinforcement learning (RL) leverages previously collected data for policy
optimization without any further active exploration. Despite the recent interest in this …

On instrumental variable regression for deep offline policy evaluation

Y Chen, L Xu, C Gulcehre, T Le Paine, A Gretton… - Journal of Machine …, 2022 - jmlr.org
We show that the popular reinforcement learning (RL) strategy of estimating the stateaction
value (Q-function) by minimizing the mean squared Bellman error leads to a regression …

Counterfactual-augmented importance sampling for semi-offline policy evaluation

S Tang, J Wiens - Advances in Neural Information …, 2023 - proceedings.neurips.cc
In applying reinforcement learning (RL) to high-stakes domains, quantitative and qualitative
evaluation using observational data can help practitioners understand the generalization …

Draftrec: personalized draft recommendation for winning in multi-player online battle arena games

H Lee, D Hwang, H Kim, B Lee, J Choo - Proceedings of the ACM Web …, 2022 - dl.acm.org
This paper presents a personalized character recommendation system for Multiplayer
Online Battle Arena (MOBA) games which are considered as one of the most popular online …

Local metric learning for off-policy evaluation in contextual bandits with continuous actions

H Lee, J Lee, Y Choi, W Jeon, BJ Lee… - Advances in …, 2022 - proceedings.neurips.cc
We consider local kernel metric learning for off-policy evaluation (OPE) of deterministic
policies in contextual bandits with continuous action spaces. Our work is motivated by …

Hybrid value estimation for off-policy evaluation and offline reinforcement learning

XK Jin, XH Liu, S Jiang, Y Yu - arXiv preprint arXiv:2206.02000, 2022 - arxiv.org
Value function estimation is an indispensable subroutine in reinforcement learning, which
becomes more challenging in the offline setting. In this paper, we propose Hybrid Value …