Introduction to multi-armed bandits

A Slivkins - Foundations and Trends® in Machine Learning, 2019 - nowpublishers.com
Multi-armed bandits a simple but very powerful framework for algorithms that make
decisions over time under uncertainty. An enormous body of work has accumulated over the …

Contextual bandits with large action spaces: Made practical

Y Zhu, DJ Foster, J Langford… - … Conference on Machine …, 2022 - proceedings.mlr.press
A central problem in sequential decision making is to develop algorithms that are practical
and computationally efficient, yet support the use of flexible, general-purpose models …

Probabilistic design of optimal sequential decision-making algorithms in learning and control

É Garrabé, G Russo - Annual Reviews in Control, 2022 - Elsevier
This survey is focused on certain sequential decision-making problems that involve
optimizing over probability functions. We discuss the relevance of these problems for …

Rehearsal learning for avoiding undesired future

T Qin, TZ Wang, ZH Zhou - Advances in Neural Information …, 2024 - proceedings.neurips.cc
Abstract Machine learning (ML) models have been widely used to make predictions. Instead
of a predictive statement about future outcomes, in many situations we want to pursue a …

Proportional response: Contextual bandits for simple and cumulative regret minimization

SK Krishnamurthy, R Zhan, S Athey… - Advances in Neural …, 2023 - proceedings.neurips.cc
In many applications, eg in healthcare and e-commerce, the goal of a contextual bandit may
be to learn an optimal treatment assignment policy at the end of the experiment. That is, to …

Deep hierarchy in bandits

J Hong, B Kveton, S Katariya… - International …, 2022 - proceedings.mlr.press
Mean rewards of actions are often correlated. The form of these correlations may be
complex and unknown a priori, such as the preferences of users for recommended products …

Contextual bandits with smooth regret: Efficient learning in continuous action spaces

Y Zhu, P Mineiro - International Conference on Machine …, 2022 - proceedings.mlr.press
Designing efficient general-purpose contextual bandit algorithms that work with large—or
even infinite—action spaces would facilitate application to important scenarios such as …

Top-k extreme contextual bandits with arm hierarchy

R Sen, A Rakhlin, L Ying, R Kidambi… - International …, 2021 - proceedings.mlr.press
Motivated by modern applications, such as online advertisement and recommender
systems, we study the top-$ k $ extreme contextual bandits problem, where the total number …

Oracle-efficient pessimism: Offline policy optimization in contextual bandits

L Wang, A Krishnamurthy… - … Conference on Artificial …, 2024 - proceedings.mlr.press
We consider offline policy optimization (OPO) in contextual bandits, where one is given a
fixed dataset of logged interactions. While pessimistic regularizers are typically used to …

Learning to Relax: Setting Solver Parameters Across a Sequence of Linear System Instances

M Khodak, E Chow, MF Balcan, A Talwalkar - arXiv preprint arXiv …, 2023 - arxiv.org
Solving a linear system $ Ax= b $ is a fundamental scientific computing primitive for which
numerous solvers and preconditioners have been developed. These come with parameters …