Bootstrapping with models: Confidence intervals for off-policy evaluation

S Levine, A Kumar, G Tucker, J Fu - arXiv preprint arXiv:2005.01643, 2020 - arxiv.org

In this tutorial article, we aim to provide the reader with the conceptual tools needed to get
started on research on offline reinforcement learning algorithms: reinforcement learning …

被引用次数：1738 相关文章所有 3 个版本

[PDF] jmlr.org

A review of robot learning for manipulation: Challenges, representations, and algorithms

O Kroemer, S Niekum, G Konidaris - Journal of machine learning research, 2021 - jmlr.org

A key challenge in intelligent robotics is creating robots that are capable of directly
interacting with the world around them to achieve their goals. The last decade has seen …

被引用次数：343 相关文章所有 15 个版本

[PDF] arxiv.org

Rt-1: Robotics transformer for real-world control at scale

A Brohan, N Brown, J Carbajal, Y Chebotar… - arXiv preprint arXiv …, 2022 - arxiv.org

By transferring knowledge from large, diverse, task-agnostic datasets, modern machine
learning models can solve specific downstream tasks either zero-shot or with small task …

被引用次数：492 相关文章所有 3 个版本

[PDF] nematilab.info

The artificial intelligence clinician learns optimal treatment strategies for sepsis in intensive care

M Komorowski, LA Celi, O Badawi, AC Gordon… - Nature medicine, 2018 - nature.com

Sepsis is the third leading cause of death worldwide and the main cause of mortality in
hospitals,–, but the best treatment strategy remains uncertain. In particular, evidence …

被引用次数：982 相关文章所有 12 个版本

[PDF] neurips.cc

Coindice: Off-policy confidence interval estimation

B Dai, O Nachum, Y Chow, L Li… - Advances in neural …, 2020 - proceedings.neurips.cc

We study high-confidence behavior-agnostic off-policy evaluation in reinforcement learning,
where the goal is to estimate a confidence interval on a target policy's value, given only …

被引用次数：81 相关文章所有 12 个版本

[PDF] sigcomm.org

Verifying learning-augmented systems

T Eliyahu, Y Kazak, G Katz, M Schapira - Proceedings of the 2021 ACM …, 2021 - dl.acm.org

The application of deep reinforcement learning (DRL) to computer and networked systems
has recently gained significant popularity. However, the obscurity of decisions by DRL …

被引用次数：52 相关文章所有 2 个版本

[PDF] neurips.cc

Universal off-policy evaluation

Y Chandak, S Niekum, B da Silva… - Advances in …, 2021 - proceedings.neurips.cc

When faced with sequential decision-making problems, it is often useful to be able to predict
what would happen if decisions were made using a new policy. Those predictions must …

被引用次数：49 相关文章所有 11 个版本

[PDF] arxiv.org

Learning when-to-treat policies

X Nie, E Brunskill, S Wager - Journal of the American Statistical …, 2021 - Taylor & Francis

Many applied decision-making problems have a dynamic component: The policymaker
needs not only to choose whom to treat, but also when to start which treatment. For example …

被引用次数：97 相关文章所有 13 个版本

[PDF] neurips.cc

Off-policy policy evaluation for sequential decisions under unobserved confounding

H Namkoong, R Keramati… - Advances in Neural …, 2020 - proceedings.neurips.cc

When observed decisions depend only on observed features, off-policy policy evaluation
(OPE) methods for sequential decision problems can estimate the performance of evaluation …

被引用次数：68 相关文章所有 7 个版本

[PDF] mlr.press

Deeply-debiased off-policy interval estimation

C Shi, R Wan, V Chernozhukov… - … conference on machine …, 2021 - proceedings.mlr.press

Off-policy evaluation learns a target policy's value with a historical dataset generated by a
different behavior policy. In addition to a point estimate, many applications would benefit …

被引用次数：36 相关文章所有 5 个版本

高级搜索

QQ 群