- 学术资源搜索

Inference for batched bandits

K Zhang, L Janson, S Murphy - Advances in neural …, 2020 - proceedings.neurips.cc

As bandit algorithms are increasingly utilized in scientific studies and industrial applications,
there is an associated increasing need for reliable inference methods based on the resulting …

被引用次数：107 相关文章所有 11 个版本

[PDF] arxiv.org

Off-policy evaluation via adaptive weighting with data from contextual bandits

R Zhan, V Hadad, DA Hirshberg, S Athey - Proceedings of the 27th ACM …, 2021 - dl.acm.org

It has become increasingly common for data to be collected adaptively, for example using
contextual bandits. Historical data of this type can be used to evaluate other treatment …

被引用次数：63 相关文章所有 5 个版本

[HTML] nih.gov

[HTML][HTML] Response-adaptive randomization in clinical trials: from myths to practical considerations

DS Robertson, KM Lee… - Statistical science: a …, 2023 - ncbi.nlm.nih.gov

Abstract Response-Adaptive Randomization (RAR) is part of a wider class of data-
dependent sampling algorithms, for which clinical trials are typically used as a motivating …

被引用次数：90 相关文章所有 14 个版本

[PDF] arxiv.org

Policy learning with adaptively collected data

R Zhan, Z Ren, S Athey, Z Zhou - Management Science, 2024 - pubsonline.informs.org

In a wide variety of applications, including healthcare, bidding in first price auctions, digital
recommendations, and online education, it can be beneficial to learn a policy that assigns …

被引用次数：38 相关文章所有 7 个版本

[PDF] neurips.cc

A closer look at the worst-case behavior of multi-armed bandit algorithms

A Kalvit, A Zeevi - Advances in Neural Information …, 2021 - proceedings.neurips.cc

One of the key drivers of complexity in the classical (stochastic) multi-armed bandit (MAB)
problem is the difference between mean rewards in the top two arms, also known as the …

被引用次数：38 相关文章所有 8 个版本

[PDF] mlr.press

Metalearning linear bandits by prior update

A Peleg, N Pearl, R Meir - International Conference on …, 2022 - proceedings.mlr.press

Fully Bayesian approaches to sequential decision-making assume that problem parameters
are generated from a known prior. In practice, such information is often lacking. This problem …

被引用次数：21 相关文章所有 4 个版本

[PDF] neurips.cc

A unified framework for bandit multiple testing

Z Xu, R Wang, A Ramdas - Advances in Neural Information …, 2021 - proceedings.neurips.cc

In bandit multiple hypothesis testing, each arm corresponds to a different null hypothesis that
we wish to test, and the goal is to design adaptive algorithms that correctly identify large set …

被引用次数：19 相关文章所有 9 个版本

[PDF] arxiv.org

Near-optimal inference in adaptive linear regression

K Khamaru, Y Deshpande, T Lattimore… - arXiv preprint arXiv …, 2021 - arxiv.org

When data is collected in an adaptive manner, even simple methods like ordinary least
squares can exhibit non-normal asymptotic behavior. As an undesirable consequence …

被引用次数：23 相关文章所有 2 个版本

[PDF] neurips.cc

Adaptive linear estimating equations

M Ying, K Khamaru, CH Zhang - Advances in Neural …, 2024 - proceedings.neurips.cc

Sequential data collection has emerged as a widely adopted technique for enhancing the
efficiency of data gathering processes. Despite its advantages, such data collection …

被引用次数：4 相关文章所有 11 个版本

[PDF] mlr.press

Safe optimal design with applications in off-policy learning

R Zhu, B Kveton - International Conference on Artificial …, 2022 - proceedings.mlr.press

Motivated by practical needs in online experimentation and off-policy learning, we study the
problem of safe optimal design, where we develop a data logging policy that efficiently …

被引用次数：10 相关文章所有 2 个版本

高级搜索

QQ 群

Inference for batched bandits

Off-policy evaluation via adaptive weighting with data from contextual bandits

[HTML][HTML] Response-adaptive randomization in clinical trials: from myths to practical considerations

Policy learning with adaptively collected data

A closer look at the worst-case behavior of multi-armed bandit algorithms

Metalearning linear bandits by prior update

A unified framework for bandit multiple testing

Near-optimal inference in adaptive linear regression

Adaptive linear estimating equations

Safe optimal design with applications in off-policy learning

引用