Coindice: Off-policy confidence interval estimation

M Uehara, C Shi, N Kallus - arXiv preprint arXiv:2212.06355, 2022 - arxiv.org

Reinforcement learning (RL) is one of the most vibrant research frontiers in machine
learning and has been recently applied to solve a number of challenging problems. In this …

被引用次数：45 相关文章所有 2 个版本

[PDF] academia.edu

A review of recent advances in empirical likelihood

P Liu, Y Zhao - Wiley Interdisciplinary Reviews: Computational …, 2023 - Wiley Online Library

Empirical likelihood is widely used in many statistical problems. In this article, we provide a
review of the empirical likelihood method, due to its significant development in recent years …

被引用次数：16 相关文章所有 6 个版本

[PDF] mlr.press

Optidice: Offline policy optimization via stationary distribution correction estimation

J Lee, W Jeon, B Lee, J Pineau… - … Conference on Machine …, 2021 - proceedings.mlr.press

We consider the offline reinforcement learning (RL) setting where the agent aims to optimize
the policy solely from the data without further environment interactions. In offline RL, the …

被引用次数：85 相关文章所有 9 个版本

[PDF] arxiv.org

Empirical study of off-policy policy evaluation for reinforcement learning

C Voloshin, HM Le, N Jiang, Y Yue - arXiv preprint arXiv:1911.06854, 2019 - arxiv.org

We offer an experimental benchmark and empirical study for off-policy policy evaluation
(OPE) in reinforcement learning, which is a key problem in many safety critical applications …

被引用次数：138 相关文章所有 11 个版本

[PDF] neurips.cc

VOCE: Variational optimization with conservative estimation for offline safe reinforcement learning

J Guan, G Chen, J Ji, L Yang… - Advances in Neural …, 2024 - proceedings.neurips.cc

Offline safe reinforcement learning (RL) algorithms promise to learn policies that satisfy
safety constraints directly in offline datasets without interacting with the environment. This …

被引用次数：6 相关文章所有 4 个版本

[PDF] neurips.cc

Why so pessimistic? estimating uncertainties for offline rl through ensembles, and why their independence matters

K Ghasemipour, SS Gu… - Advances in Neural …, 2022 - proceedings.neurips.cc

Motivated by the success of ensembles for uncertainty estimation in supervised learning, we
take a renewed look at how ensembles of $ Q $-functions can be leveraged as the primary …

被引用次数：45 相关文章所有 6 个版本

[PDF] arxiv.org

Toward theoretical understandings of robust markov decision processes: Sample complexity and asymptotics

W Yang, L Zhang, Z Zhang - The Annals of Statistics, 2022 - projecteuclid.org

Toward theoretical understandings of robust Markov decision processes: Sample
complexity and asymptotics Page 1 The Annals of Statistics 2022, Vol. 50, No. 6, 3223–3248 …

被引用次数：53 相关文章所有 5 个版本

[PDF] mlr.press

A minimax learning approach to off-policy evaluation in confounded partially observable markov decision processes

C Shi, M Uehara, J Huang… - … Conference on Machine …, 2022 - proceedings.mlr.press

We consider off-policy evaluation (OPE) in Partially Observable Markov Decision Processes
(POMDPs), where the evaluation policy depends only on observable variables and the …

被引用次数：33 相关文章所有 7 个版本

[PDF] neurips.cc

Universal off-policy evaluation

Y Chandak, S Niekum, B da Silva… - Advances in …, 2021 - proceedings.neurips.cc

When faced with sequential decision-making problems, it is often useful to be able to predict
what would happen if decisions were made using a new policy. Those predictions must …

被引用次数：50 相关文章所有 11 个版本

[PDF] neurips.cc

Minimax value interval for off-policy evaluation and policy optimization

N Jiang, J Huang - Advances in Neural Information …, 2020 - proceedings.neurips.cc

We study minimax methods for off-policy evaluation (OPE) using value functions and
marginalized importance weights. Despite that they hold promises of overcoming the …

被引用次数：79 相关文章所有 7 个版本

高级搜索

QQ 群