Efficiently breaking the curse of horizon in off-policy evaluation with double reinforcement...

Y Jin, Z Yang, Z Wang - International Conference on …, 2021 - proceedings.mlr.press

We study offline reinforcement learning (RL), which aims to learn an optimal policy based on
a dataset collected a priori. Due to the lack of further interactions with the environment …

被引用次数：392 相关文章所有 7 个版本

[PDF] neurips.cc

Provable benefits of actor-critic methods for offline reinforcement learning

A Zanette, MJ Wainwright… - Advances in neural …, 2021 - proceedings.neurips.cc

Actor-critic methods are widely used in offline reinforcement learningpractice, but are not so
well-understood theoretically. We propose a newoffline actor-critic algorithm that naturally …

被引用次数：124 相关文章所有 8 个版本

[PDF] oup.com

Statistical inference of the value function for reinforcement learning in infinite-horizon settings

C Shi, S Zhang, W Lu, R Song - Journal of the Royal Statistical …, 2022 - academic.oup.com

Reinforcement learning is a general technique that allows an agent to learn an optimal
policy and interact with an environment in sequential decision-making problems. The …

被引用次数：100 相关文章所有 12 个版本

[PDF] mlr.press

When is realizability sufficient for off-policy reinforcement learning?

A Zanette - International Conference on Machine Learning, 2023 - proceedings.mlr.press

Understanding when reinforcement learning algorithms can make successful off-policy
predictions—and when the may fail to do so–remains an open problem. Typically, model …

被引用次数：16 相关文章所有 7 个版本

[PDF] tandfonline.com

Off-policy confidence interval estimation with confounded markov decision process

C Shi, J Zhu, Y Shen, S Luo, H Zhu… - Journal of the American …, 2024 - Taylor & Francis

This article is concerned with constructing a confidence interval for a target policy's value
offline based on a pre-collected observational data in infinite horizon settings. Most of the …

被引用次数：34 相关文章所有 11 个版本

[PDF] mlr.press

Instabilities of offline rl with pre-trained neural representation

R Wang, Y Wu, R Salakhutdinov… - … on Machine Learning, 2021 - proceedings.mlr.press

In offline reinforcement learning (RL), we seek to utilize offline data to evaluate (or learn)
policies in scenarios where the data are collected from a distribution that substantially differs …

被引用次数：50 相关文章所有 8 个版本

[PDF] neurips.cc

Nearly horizon-free offline reinforcement learning

T Ren, J Li, B Dai, SS Du… - Advances in neural …, 2021 - proceedings.neurips.cc

We revisit offline reinforcement learning on episodic time-homogeneous Markov Decision
Processes (MDP). For tabular MDP with $ S $ states and $ A $ actions, or linear MDP with …

被引用次数：48 相关文章所有 7 个版本

[PDF] neurips.cc

Post-contextual-bandit inference

A Bibaut, M Dimakopoulou, N Kallus… - Advances in neural …, 2021 - proceedings.neurips.cc

Contextual bandit algorithms are increasingly replacing non-adaptive A/B tests in e-
commerce, healthcare, and policymaking because they can both improve outcomes for …

被引用次数：43 相关文章所有 14 个版本

[PDF] mlr.press

Deeply-debiased off-policy interval estimation

C Shi, R Wan, V Chernozhukov… - … conference on machine …, 2021 - proceedings.mlr.press

Off-policy evaluation learns a target policy's value with a historical dataset generated by a
different behavior policy. In addition to a point estimate, many applications would benefit …

被引用次数：38 相关文章所有 5 个版本

[PDF] mlr.press

Off-policy evaluation in infinite-horizon reinforcement learning with latent confounders

A Bennett, N Kallus, L Li… - … Conference on Artificial …, 2021 - proceedings.mlr.press

Off-policy evaluation (OPE) in reinforcement learning is an important problem in settings
where experimentation is limited, such as healthcare. But, in these very same settings …

被引用次数：48 相关文章所有 6 个版本

高级搜索

QQ 群