Coindice: Off-policy confidence interval estimation

B Dai, O Nachum, Y Chow, L Li… - Advances in neural …, 2020 - proceedings.neurips.cc
We study high-confidence behavior-agnostic off-policy evaluation in reinforcement learning,
where the goal is to estimate a confidence interval on a target policy's value, given only …

[PDF][PDF] CoinDICE: Off-Policy Confidence Interval Estimation

B Dai, O Nachum, Y Chow, L Li, C Szepesvári… - ualberta.ca
We study high-confidence behavior-agnostic off-policy evaluation in reinforcement learning,
where the goal is to estimate a confidence interval on a target policy's value, given only …

[PDF][PDF] CoinDICE: Off-Policy Confidence Interval Estimation

B Dai, O Nachum, Y Chow, L Li, C Szepesvári… - sites.ualberta.ca
We study high-confidence behavior-agnostic off-policy evaluation in reinforcement learning,
where the goal is to estimate a confidence interval on a target policy's value, given only …

CoinDICE: off-policy confidence interval estimation

B Dai, O Nachum, Y Chow, L Li, C Szepesvári… - Proceedings of the 34th …, 2020 - dl.acm.org
We study high-confidence behavior-agnostic off-policy evaluation in reinforcement learning,
where the goal is to estimate a confidence interval on a target policy's value, given only …

CoinDICE: Off-Policy Confidence Interval Estimation

B Dai, O Nachum, Y Chow, L Li, C Szepesvári… - arXiv e …, 2020 - ui.adsabs.harvard.edu
We study high-confidence behavior-agnostic off-policy evaluation in reinforcement learning,
where the goal is to estimate a confidence interval on a target policy's value, given only …

[PDF][PDF] CoinDICE: Off-Policy Confidence Interval Estimation

B Dai, O Nachum, Y Chow, L Li, C Szepesvári… - webdocs.cs.ualberta.ca
We study high-confidence behavior-agnostic off-policy evaluation in reinforcement learning,
where the goal is to estimate a confidence interval on a target policy's value, given only …

CoinDICE: Off-Policy Confidence Interval Estimation

B Dai, O Nachum, Y Chow, L Li, C Szepesvari… - aminer.cn
摘要We study high-confidence behavior-agnostic off-policy evaluation in reinforcement
learning, where the goal is to estimate a confidence interval on a target policy's value, given …

CoinDICE: Off-Policy Confidence Interval Estimation

B Dai, O Nachum, Y Chow, L Li… - Advances in …, 2020 - proceedings.neurips.cc
We study high-confidence behavior-agnostic off-policy evaluation in reinforcement learning,
where the goal is to estimate a confidence interval on a target policy's value, given only …

CoinDICE: Off-Policy Confidence Interval Estimation

B Dai, O Nachum, Y Chow, L Li, C Szepesvári… - arXiv preprint arXiv …, 2020 - arxiv.org
We study high-confidence behavior-agnostic off-policy evaluation in reinforcement learning,
where the goal is to estimate a confidence interval on a target policy's value, given only …

[PDF][PDF] CoinDICE: Off-policy Confidence Interval Estimation

B Dai - simons.berkeley.edu
CoinDICE: Off-policy Confidence Interval Estimation Page 1 CoinDICE: Off-policy Confidence
Interval Estimation Bo Dai Google Research, Brain Team joint work with Ofir Nachum, Yinlam …