On the reuse bias in off-policy reinforcement learning

C Ying, Z Hao, X Zhou, H Su, D Yan, J Zhu - arXiv preprint arXiv …, 2022 - arxiv.org
Importance sampling (IS) is a popular technique in off-policy evaluation, which re-weights
the return of trajectories in the replay buffer to boost sample efficiency. However, training …

Combining model-based and model-free reinforcement learning policies for more efficient sepsis treatment

X Liu, C Yu, Q Huang, L Wang, J Wu… - … Research and Applications …, 2021 - Springer
Sepsis is the main cause of mortality in intensive care units (ICUs), but the optimal treatment
strategy still remains unclear. Managing the treatment of sepsis is challenging because …

Scaling marginalized importance sampling to high-dimensional state-spaces via state abstraction

BS Pavse, JP Hanna - Proceedings of the AAAI Conference on Artificial …, 2023 - ojs.aaai.org
We consider the problem of off-policy evaluation (OPE) in reinforcement learning (RL),
where the goal is to estimate the performance of an evaluation policy, pie, using a fixed …

Clinical knowledge-guided deep reinforcement learning for sepsis antibiotic dosing recommendations

Y Wang, A Liu, J Yang, L Wang, N Xiong… - Artificial Intelligence in …, 2024 - Elsevier
Sepsis is the third leading cause of death worldwide. Antibiotics are an important component
in the treatment of sepsis. The use of antibiotics is currently facing the challenge of …

Online safety assurance for deep reinforcement learning

NH Rotman, M Schapira, A Tamar - arXiv preprint arXiv:2010.03625, 2020 - arxiv.org
Recently, deep learning has been successfully applied to a variety of networking problems.
A fundamental challenge is that when the operational environment for a learning …

Causal inference using observational intensive care unit data: a systematic review and recommendations for future practice

JM Smit, JH Krijthe, J van Bommel, JA Labrecque… - Medrxiv, 2022 - medrxiv.org
Aim To review and appraise the quality of studies that present models for causal inference of
time-varying treatment effects in the adult intensive care unit (ICU) and give …

High-confidence off-policy (or counterfactual) variance estimation

Y Chandak, S Shankar, PS Thomas - Proceedings of the AAAI …, 2021 - ojs.aaai.org
Many sequential decision-making systems leverage data collected using prior policies to
propose a new policy. For critical applications, it is important that high-confidence …

[PDF][PDF] Towards high confidence off-policy reinforcement learning for clinical applications

A Jagannatha, P Thomas, H Yu - CausalML Workshop, ICML, 2018 - all.cs.umass.edu
We study the properties of off-policy reinforcement learning algorithms when applied to a
real world clinical scenario. Towards this end, we evaluate standard off-policy training …

Offline Transition Modeling via Contrastive Energy Learning

R Chen, C Jia, Z Huang, TS Liu, XH Liu, Y Yu - Forty-first International … - openreview.net
Learning a high-quality transition model is of great importance for sequential decision-
making tasks, especially in offline settings. Nevertheless, the complex behaviors of transition …

Policy-conditioned Environment Models are More Generalizable

R Chen, XH Chen, Y Sun, S Xiao, M Li, Y Yu - Forty-first International … - openreview.net
In reinforcement learning, it is crucial to have an accurate environment dynamics model to
evaluate different policies' value in downstream tasks like offline policy optimization and …