Learning when-to-treat policies

S Levine, A Kumar, G Tucker, J Fu - arXiv preprint arXiv:2005.01643, 2020 - arxiv.org

In this tutorial article, we aim to provide the reader with the conceptual tools needed to get
started on research on offline reinforcement learning algorithms: reinforcement learning …

被引用次数：1781 相关文章所有 3 个版本

[PDF] neurips.cc

Bridging offline reinforcement learning and imitation learning: A tale of pessimism

P Rashidinejad, B Zhu, C Ma, J Jiao… - Advances in Neural …, 2021 - proceedings.neurips.cc

Offline (or batch) reinforcement learning (RL) algorithms seek to learn an optimal policy from
a fixed dataset without active data collection. Based on the composition of the offline dataset …

被引用次数：275 相关文章所有 8 个版本

[PDF] neurips.cc

Rambo-rl: Robust adversarial model-based offline reinforcement learning

M Rigter, B Lacerda, N Hawes - Advances in neural …, 2022 - proceedings.neurips.cc

Offline reinforcement learning (RL) aims to find performant policies from logged data without
further environment interaction. Model-based algorithms, which learn a model of the …

被引用次数：100 相关文章所有 7 个版本

[PDF] arxiv.org

Policy learning with observational data

S Athey, S Wager - Econometrica, 2021 - Wiley Online Library

In many areas, practitioners seek to use observational data to learn a treatment assignment
policy that satisfies application‐specific constraints, such as budget, fairness, simplicity, or …

被引用次数：439 相关文章所有 13 个版本

[PDF] arxiv.org

Empirical study of off-policy policy evaluation for reinforcement learning

C Voloshin, HM Le, N Jiang, Y Yue - arXiv preprint arXiv:1911.06854, 2019 - arxiv.org

We offer an experimental benchmark and empirical study for off-policy policy evaluation
(OPE) in reinforcement learning, which is a key problem in many safety critical applications …

被引用次数：141 相关文章所有 11 个版本

[PDF] arxiv.org

Offline reinforcement learning: Fundamental barriers for value function approximation

DJ Foster, A Krishnamurthy, D Simchi-Levi… - arXiv preprint arXiv …, 2021 - arxiv.org

We consider the offline reinforcement learning problem, where the aim is to learn a decision
making policy from logged data. Offline RL--particularly when coupled with (value) function …

被引用次数：62 相关文章所有 5 个版本

[PDF] wiley.com

Optimal treatment regimes: a review and empirical comparison

Z Li, J Chen, E Laber, F Liu… - International Statistical …, 2023 - Wiley Online Library

A treatment regime is a sequence of decision rules, one per decision point, that maps
accumulated patient information to a recommended intervention. An optimal treatment …

被引用次数：9 相关文章所有 5 个版本

[PDF] neurips.cc

Off-policy policy evaluation for sequential decisions under unobserved confounding

H Namkoong, R Keramati… - Advances in Neural …, 2020 - proceedings.neurips.cc

When observed decisions depend only on observed features, off-policy policy evaluation
(OPE) methods for sequential decision problems can estimate the performance of evaluation …

被引用次数：69 相关文章所有 6 个版本

[PDF] aaai.org

On instance-dependent bounds for offline reinforcement learning with linear function approximation

T Nguyen-Tang, M Yin, S Gupta, S Venkatesh… - Proceedings of the …, 2023 - ojs.aaai.org

Sample-efficient offline reinforcement learning (RL) with linear function approximation has
been studied extensively recently. Much of the prior work has yielded instance-independent …

被引用次数：13 相关文章所有 7 个版本

[PDF] neurips.cc

Active offline policy selection

K Konyushova, Y Chen, T Paine… - Advances in …, 2021 - proceedings.neurips.cc

This paper addresses the problem of policy selection in domains with abundant logged data,
but with a restricted interaction budget. Solving this problem would enable safe evaluation …

被引用次数：24 相关文章所有 7 个版本

高级搜索

QQ 群

Offline reinforcement learning: Tutorial, review, and perspectives on open problems

Bridging offline reinforcement learning and imitation learning: A tale of pessimism

Rambo-rl: Robust adversarial model-based offline reinforcement learning

Policy learning with observational data

Empirical study of off-policy policy evaluation for reinforcement learning

Offline reinforcement learning: Fundamental barriers for value function approximation

Optimal treatment regimes: a review and empirical comparison

Off-policy policy evaluation for sequential decisions under unobserved confounding

On instance-dependent bounds for offline reinforcement learning with linear function approximation

Active offline policy selection

引用