Efficient contextual bandits with continuous actions

A Slivkins - Foundations and Trends® in Machine Learning, 2019 - nowpublishers.com

Multi-armed bandits a simple but very powerful framework for algorithms that make
decisions over time under uncertainty. An enormous body of work has accumulated over the …

被引用次数：1235 相关文章所有 7 个版本

[PDF] mlr.press

Contextual bandits with large action spaces: Made practical

Y Zhu, DJ Foster, J Langford… - … Conference on Machine …, 2022 - proceedings.mlr.press

A central problem in sequential decision making is to develop algorithms that are practical
and computationally efficient, yet support the use of flexible, general-purpose models …

被引用次数：37 相关文章所有 3 个版本

[PDF] arxiv.org

Probabilistic design of optimal sequential decision-making algorithms in learning and control

É Garrabé, G Russo - Annual Reviews in Control, 2022 - Elsevier

This survey is focused on certain sequential decision-making problems that involve
optimizing over probability functions. We discuss the relevance of these problems for …

被引用次数：11 相关文章所有 4 个版本

[PDF] neurips.cc

Rehearsal learning for avoiding undesired future

T Qin, TZ Wang, ZH Zhou - Advances in Neural Information …, 2024 - proceedings.neurips.cc

Abstract Machine learning (ML) models have been widely used to make predictions. Instead
of a predictive statement about future outcomes, in many situations we want to pursue a …

被引用次数：4 相关文章所有 3 个版本

[PDF] neurips.cc

Proportional response: Contextual bandits for simple and cumulative regret minimization

SK Krishnamurthy, R Zhan, S Athey… - Advances in Neural …, 2023 - proceedings.neurips.cc

In many applications, eg in healthcare and e-commerce, the goal of a contextual bandit may
be to learn an optimal treatment assignment policy at the end of the experiment. That is, to …

被引用次数：10 相关文章所有 6 个版本

[PDF] mlr.press

Deep hierarchy in bandits

J Hong, B Kveton, S Katariya… - International …, 2022 - proceedings.mlr.press

Mean rewards of actions are often correlated. The form of these correlations may be
complex and unknown a priori, such as the preferences of users for recommended products …

被引用次数：22 相关文章所有 6 个版本

[PDF] mlr.press

Contextual bandits with smooth regret: Efficient learning in continuous action spaces

Y Zhu, P Mineiro - International Conference on Machine …, 2022 - proceedings.mlr.press

Designing efficient general-purpose contextual bandit algorithms that work with large—or
even infinite—action spaces would facilitate application to important scenarios such as …

被引用次数：16 相关文章所有 3 个版本

[PDF] mlr.press

Top-k extreme contextual bandits with arm hierarchy

R Sen, A Rakhlin, L Ying, R Kidambi… - International …, 2021 - proceedings.mlr.press

Motivated by modern applications, such as online advertisement and recommender
systems, we study the top-$ k $ extreme contextual bandits problem, where the total number …

被引用次数：30 相关文章所有 10 个版本

[PDF] mlr.press

Oracle-efficient pessimism: Offline policy optimization in contextual bandits

L Wang, A Krishnamurthy… - … Conference on Artificial …, 2024 - proceedings.mlr.press

We consider offline policy optimization (OPO) in contextual bandits, where one is given a
fixed dataset of logged interactions. While pessimistic regularizers are typically used to …

被引用次数：8 相关文章所有 3 个版本

[PDF] arxiv.org

Learning to Relax: Setting Solver Parameters Across a Sequence of Linear System Instances

M Khodak, E Chow, MF Balcan, A Talwalkar - arXiv preprint arXiv …, 2023 - arxiv.org

Solving a linear system $ Ax= b $ is a fundamental scientific computing primitive for which
numerous solvers and preconditioners have been developed. These come with parameters …

被引用次数：5 相关文章所有 4 个版本

高级搜索

QQ 群