E Hazan - Foundations and Trends® in Optimization, 2016 - nowpublishers.com
This monograph portrays optimization as a process. In many practical applications the environment is so complex that it is infeasible to lay out a comprehensive theoretical model …
Multi-armed bandit problems are the most basic examples of sequential decision problems with an exploration-exploitation trade-off. This is the balance between staying with the option …
S Shalev-Shwartz - Foundations and Trends® in Machine …, 2012 - nowpublishers.com
Online learning is a well established learning paradigm which has both theoretical and practical appeals. The goal of online learning is to make a sequence of accurate predictions …
Many applications require optimizing an unknown, noisy function that is expensive to evaluate. We formalize this task as a multi-armed bandit problem, where the payoff function …
D Russo, B Van Roy - Journal of Machine Learning Research, 2016 - jmlr.org
We provide an information-theoretic analysis of Thompson sampling that applies across a broad range of online optimization problems in which a decision-maker must learn from …
A Krause, C Ong - Advances in neural information …, 2011 - proceedings.neurips.cc
How should we design experiments to maximize performance of a complex system, taking into account uncontrollable environmental conditions? How should we select relevant …
In the classical stochastic k-armed bandit problem, in each of a sequence of T rounds, a decision maker chooses one of k arms and incurs a cost chosen from an unknown …
This paper studies the evaluation of policies that recommend an ordered set of items (eg, a ranking) based on some context---a common scenario in web search, ads, and …
A Slivkins - Proceedings of the 24th annual Conference On …, 2011 - proceedings.mlr.press
In a multi-armed bandit (MAB) problem, an online algorithm makes a sequence of choices. In each round it chooses from a time-invariant set of alternatives and receives the payoff …