A central problem in sequential decision making is to develop algorithms that are practical and computationally efficient, yet support the use of flexible, general-purpose models …
This survey is focused on certain sequential decision-making problems that involve optimizing over probability functions. We discuss the relevance of these problems for …
T Qin, TZ Wang, ZH Zhou - Advances in Neural Information …, 2024 - proceedings.neurips.cc
Abstract Machine learning (ML) models have been widely used to make predictions. Instead of a predictive statement about future outcomes, in many situations we want to pursue a …
In many applications, eg in healthcare and e-commerce, the goal of a contextual bandit may be to learn an optimal treatment assignment policy at the end of the experiment. That is, to …
Mean rewards of actions are often correlated. The form of these correlations may be complex and unknown a priori, such as the preferences of users for recommended products …
Y Zhu, P Mineiro - International Conference on Machine …, 2022 - proceedings.mlr.press
Designing efficient general-purpose contextual bandit algorithms that work with large—or even infinite—action spaces would facilitate application to important scenarios such as …
R Sen, A Rakhlin, L Ying, R Kidambi… - International …, 2021 - proceedings.mlr.press
Motivated by modern applications, such as online advertisement and recommender systems, we study the top-$ k $ extreme contextual bandits problem, where the total number …
L Wang, A Krishnamurthy… - … Conference on Artificial …, 2024 - proceedings.mlr.press
We consider offline policy optimization (OPO) in contextual bandits, where one is given a fixed dataset of logged interactions. While pessimistic regularizers are typically used to …
Solving a linear system $ Ax= b $ is a fundamental scientific computing primitive for which numerous solvers and preconditioners have been developed. These come with parameters …