[PDF][PDF] X-Armed Bandits.

S Bubeck, R Munos, G Stoltz, C Szepesvári - Journal of Machine Learning …, 2011 - jmlr.org
We consider a generalization of stochastic bandits where the set of arms, X, is allowed to be
a generic measurable space and the mean-payoff function is “locally Lipschitz” with respect …

Pure exploration in multi-armed bandits problems

S Bubeck, R Munos, G Stoltz - … , ALT 2009, Porto, Portugal, October 3-5 …, 2009 - Springer
We consider the framework of stochastic multi-armed bandit problems and study the
possibilities and limitations of strategies that perform an online exploration of the arms. The …

Real-time bidding for online advertising: measurement and analysis

S Yuan, J Wang, X Zhao - … of the seventh international workshop on data …, 2013 - dl.acm.org
The real-time bidding (RTB), aka programmatic buying, has recently become the fastest
growing area in online advertising. Instead of bulking buying and inventory-centric buying …

Pure exploration in finitely-armed and continuous-armed bandits

S Bubeck, R Munos, G Stoltz - Theoretical Computer Science, 2011 - Elsevier
We consider the framework of stochastic multi-armed bandit problems and study the
possibilities and limitations of forecasters that perform an on-line exploration of the arms …

Bandits with concave rewards and convex knapsacks

S Agrawal, NR Devanur - Proceedings of the fifteenth ACM conference …, 2014 - dl.acm.org
In this paper, we consider a very general model for exploration-exploitation tradeoff which
allows arbitrary concave rewards and convex constraints on the decisions across time, in …

Linear contextual bandits with knapsacks

S Agrawal, N Devanur - Advances in neural information …, 2016 - proceedings.neurips.cc
We consider the linear contextual bandit problem with resource consumption, in addition to
reward generation. In each round, the outcome of pulling an arm is a reward as well as a …

An efficient algorithm for contextual bandits with knapsacks, and an extension to concave objectives

S Agrawal, NR Devanur, L Li - Conference on Learning …, 2016 - proceedings.mlr.press
We consider a contextual version of multi-armed bandit problem with global knapsack
constraints. In each round, the outcome of pulling an arm is a scalar reward and a resource …

Truncated variance reduction: A unified approach to bayesian optimization and level-set estimation

I Bogunovic, J Scarlett, A Krause… - Advances in neural …, 2016 - proceedings.neurips.cc
We present a new algorithm, truncated variance reduction (TruVaR), that treats Bayesian
optimization (BO) and level-set estimation (LSE) with Gaussian processes in a unified …

Safety-aware algorithms for adversarial contextual bandit

W Sun, D Dey, A Kapoor - International Conference on …, 2017 - proceedings.mlr.press
In this work we study the safe sequential decision making problem under the setting of
adversarial contextual bandits with sequential risk constraints. At each round, nature …

Bandits with global convex constraints and objective

S Agrawal, NR Devanur - Operations Research, 2019 - pubsonline.informs.org
We consider a very general model for managing the exploration–exploitation trade-off,
which allows global convex constraints and concave objective on the aggregate decisions …