We consider the framework of stochastic multi-armed bandit problems and study the possibilities and limitations of strategies that perform an online exploration of the arms. The …
S Yuan, J Wang, X Zhao - … of the seventh international workshop on data …, 2013 - dl.acm.org
The real-time bidding (RTB), aka programmatic buying, has recently become the fastest growing area in online advertising. Instead of bulking buying and inventory-centric buying …
We consider the framework of stochastic multi-armed bandit problems and study the possibilities and limitations of forecasters that perform an on-line exploration of the arms …
S Agrawal, NR Devanur - Proceedings of the fifteenth ACM conference …, 2014 - dl.acm.org
In this paper, we consider a very general model for exploration-exploitation tradeoff which allows arbitrary concave rewards and convex constraints on the decisions across time, in …
S Agrawal, N Devanur - Advances in neural information …, 2016 - proceedings.neurips.cc
We consider the linear contextual bandit problem with resource consumption, in addition to reward generation. In each round, the outcome of pulling an arm is a reward as well as a …
We consider a contextual version of multi-armed bandit problem with global knapsack constraints. In each round, the outcome of pulling an arm is a scalar reward and a resource …
We present a new algorithm, truncated variance reduction (TruVaR), that treats Bayesian optimization (BO) and level-set estimation (LSE) with Gaussian processes in a unified …
W Sun, D Dey, A Kapoor - International Conference on …, 2017 - proceedings.mlr.press
In this work we study the safe sequential decision making problem under the setting of adversarial contextual bandits with sequential risk constraints. At each round, nature …
We consider a very general model for managing the exploration–exploitation trade-off, which allows global convex constraints and concave objective on the aggregate decisions …