Bootstrapping with models: Confidence intervals for off-policy evaluation

C Ying, Z Hao, X Zhou, H Su, D Yan, J Zhu - arXiv preprint arXiv …, 2022 - arxiv.org

Importance sampling (IS) is a popular technique in off-policy evaluation, which re-weights
the return of trajectories in the replay buffer to boost sample efficiency. However, training …

被引用次数：2 相关文章所有 5 个版本

Combining model-based and model-free reinforcement learning policies for more efficient sepsis treatment

X Liu, C Yu, Q Huang, L Wang, J Wu… - … Research and Applications …, 2021 - Springer

Sepsis is the main cause of mortality in intensive care units (ICUs), but the optimal treatment
strategy still remains unclear. Managing the treatment of sepsis is challenging because …

被引用次数：6 相关文章所有 3 个版本

[PDF] aaai.org

Scaling marginalized importance sampling to high-dimensional state-spaces via state abstraction

BS Pavse, JP Hanna - Proceedings of the AAAI Conference on Artificial …, 2023 - ojs.aaai.org

We consider the problem of off-policy evaluation (OPE) in reinforcement learning (RL),
where the goal is to estimate the performance of an evaluation policy, pie, using a fixed …

被引用次数：2 相关文章所有 9 个版本

Clinical knowledge-guided deep reinforcement learning for sepsis antibiotic dosing recommendations

Y Wang, A Liu, J Yang, L Wang, N Xiong… - Artificial Intelligence in …, 2024 - Elsevier

Sepsis is the third leading cause of death worldwide. Antibiotics are an important component
in the treatment of sepsis. The use of antibiotics is currently facing the challenge of …

被引用次数：1 相关文章所有 4 个版本

[PDF] arxiv.org

Online safety assurance for deep reinforcement learning

NH Rotman, M Schapira, A Tamar - arXiv preprint arXiv:2010.03625, 2020 - arxiv.org

Recently, deep learning has been successfully applied to a variety of networking problems.
A fundamental challenge is that when the operational environment for a learning …

被引用次数：8 相关文章所有 2 个版本

[PDF] medrxiv.org

Causal inference using observational intensive care unit data: a systematic review and recommendations for future practice

JM Smit, JH Krijthe, J van Bommel, JA Labrecque… - Medrxiv, 2022 - medrxiv.org

Aim To review and appraise the quality of studies that present models for causal inference of
time-varying treatment effects in the adult intensive care unit (ICU) and give …

被引用次数：2 相关文章所有 3 个版本

[PDF] aaai.org

High-confidence off-policy (or counterfactual) variance estimation

Y Chandak, S Shankar, PS Thomas - Proceedings of the AAAI …, 2021 - ojs.aaai.org

Many sequential decision-making systems leverage data collected using prior policies to
propose a new policy. For critical applications, it is important that high-confidence …

被引用次数：6 相关文章所有 9 个版本

[PDF] umass.edu

[PDF][PDF] Towards high confidence off-policy reinforcement learning for clinical applications

A Jagannatha, P Thomas, H Yu - CausalML Workshop, ICML, 2018 - all.cs.umass.edu

We study the properties of off-policy reinforcement learning algorithms when applied to a
real world clinical scenario. Towards this end, we evaluate standard off-policy training …

被引用次数：10 相关文章所有 2 个版本

[PDF] openreview.net

Offline Transition Modeling via Contrastive Energy Learning

R Chen, C Jia, Z Huang, TS Liu, XH Liu, Y Yu - Forty-first International … - openreview.net

Learning a high-quality transition model is of great importance for sequential decision-
making tasks, especially in offline settings. Nevertheless, the complex behaviors of transition …

被引用次数：1 相关文章

[PDF] openreview.net

Policy-conditioned Environment Models are More Generalizable

R Chen, XH Chen, Y Sun, S Xiao, M Li, Y Yu - Forty-first International … - openreview.net

In reinforcement learning, it is crucial to have an accurate environment dynamics model to
evaluate different policies' value in downstream tasks like offline policy optimization and …

被引用次数：1 相关文章

高级搜索

QQ 群