The budgeted multi-armed bandit problem

[PDF][PDF] X-Armed Bandits.

S Bubeck, R Munos, G Stoltz, C Szepesvári - Journal of Machine Learning …, 2011 - jmlr.org

We consider a generalization of stochastic bandits where the set of arms, X, is allowed to be
a generic measurable space and the mean-payoff function is “locally Lipschitz” with respect …

被引用次数：519 相关文章所有 32 个版本

[PDF] arxiv.org

Pure exploration in multi-armed bandits problems

S Bubeck, R Munos, G Stoltz - … , ALT 2009, Porto, Portugal, October 3-5 …, 2009 - Springer

We consider the framework of stochastic multi-armed bandit problems and study the
possibilities and limitations of strategies that perform an online exploration of the arms. The …

被引用次数：640 相关文章所有 31 个版本

[PDF] arxiv.org

Real-time bidding for online advertising: measurement and analysis

S Yuan, J Wang, X Zhao - … of the seventh international workshop on data …, 2013 - dl.acm.org

The real-time bidding (RTB), aka programmatic buying, has recently become the fastest
growing area in online advertising. Instead of bulking buying and inventory-centric buying …

被引用次数：357 相关文章所有 10 个版本

[PDF] sciencedirect.com

Pure exploration in finitely-armed and continuous-armed bandits

S Bubeck, R Munos, G Stoltz - Theoretical Computer Science, 2011 - Elsevier

We consider the framework of stochastic multi-armed bandit problems and study the
possibilities and limitations of forecasters that perform an on-line exploration of the arms …

被引用次数：333 相关文章所有 22 个版本

[PDF] arxiv.org

Bandits with concave rewards and convex knapsacks

S Agrawal, NR Devanur - Proceedings of the fifteenth ACM conference …, 2014 - dl.acm.org

In this paper, we consider a very general model for exploration-exploitation tradeoff which
allows arbitrary concave rewards and convex constraints on the decisions across time, in …

被引用次数：238 相关文章所有 6 个版本

[PDF] neurips.cc

Linear contextual bandits with knapsacks

S Agrawal, N Devanur - Advances in neural information …, 2016 - proceedings.neurips.cc

We consider the linear contextual bandit problem with resource consumption, in addition to
reward generation. In each round, the outcome of pulling an arm is a reward as well as a …

被引用次数：176 相关文章所有 8 个版本

[PDF] mlr.press

An efficient algorithm for contextual bandits with knapsacks, and an extension to concave objectives

S Agrawal, NR Devanur, L Li - Conference on Learning …, 2016 - proceedings.mlr.press

We consider a contextual version of multi-armed bandit problem with global knapsack
constraints. In each round, the outcome of pulling an arm is a scalar reward and a resource …

被引用次数：112 相关文章所有 6 个版本

[PDF] neurips.cc

Truncated variance reduction: A unified approach to bayesian optimization and level-set estimation

I Bogunovic, J Scarlett, A Krause… - Advances in neural …, 2016 - proceedings.neurips.cc

We present a new algorithm, truncated variance reduction (TruVaR), that treats Bayesian
optimization (BO) and level-set estimation (LSE) with Gaussian processes in a unified …

被引用次数：99 相关文章所有 14 个版本

[PDF] mlr.press

Safety-aware algorithms for adversarial contextual bandit

W Sun, D Dey, A Kapoor - International Conference on …, 2017 - proceedings.mlr.press

In this work we study the safe sequential decision making problem under the setting of
adversarial contextual bandits with sequential risk constraints. At each round, nature …

被引用次数：78 相关文章所有 6 个版本

Bandits with global convex constraints and objective

S Agrawal, NR Devanur - Operations Research, 2019 - pubsonline.informs.org

We consider a very general model for managing the exploration–exploitation trade-off,
which allows global convex constraints and concave objective on the aggregate decisions …

被引用次数：47 相关文章所有 5 个版本

高级搜索

QQ 群