[图书][B] Bandit algorithms

T Lattimore, C Szepesvári - 2020 - books.google.com
Decision-making in the face of uncertainty is a significant challenge in machine learning,
and the multi-armed bandit model is a commonly used framework to address it. This …

Learning to optimize via posterior sampling

D Russo, B Van Roy - Mathematics of Operations Research, 2014 - pubsonline.informs.org
This paper considers the use of a simple posterior sampling algorithm to balance between
exploration and exploitation when learning to optimize actions such as in multiarmed bandit …

Near-optimal regret bounds for thompson sampling

S Agrawal, N Goyal - Journal of the ACM (JACM), 2017 - dl.acm.org
Thompson Sampling (TS) is one of the oldest heuristics for multiarmed bandit problems. It is
a randomized algorithm based on Bayesian ideas and has recently generated significant …

Simple bayesian algorithms for best arm identification

D Russo - Conference on Learning Theory, 2016 - proceedings.mlr.press
This paper considers the optimal adaptive allocation of measurement effort for identifying the
best among a finite set of options or designs. An experimenter sequentially chooses designs …

An information-theoretic analysis of thompson sampling

D Russo, B Van Roy - Journal of Machine Learning Research, 2016 - jmlr.org
We provide an information-theoretic analysis of Thompson sampling that applies across a
broad range of online optimization problems in which a decision-maker must learn from …

Linear thompson sampling revisited

M Abeille, A Lazaric - Artificial Intelligence and Statistics, 2017 - proceedings.mlr.press
We derive an alternative proof for the regret of Thompson sampling (TS) in the stochastic
linear bandit setting. While we obtain a regret bound of order $ O (d^ 3/2\sqrtT) $ as in …

From bandits to monte-carlo tree search: The optimistic principle applied to optimization and planning

R Munos - Foundations and Trends® in Machine Learning, 2014 - nowpublishers.com
This work covers several aspects of the optimism in the face of uncertainty principle applied
to large scale optimization problems under finite numerical budget. The initial motivation for …

Thompson sampling for complex online problems

A Gopalan, S Mannor… - … conference on machine …, 2014 - proceedings.mlr.press
We consider stochastic multi-armed bandit problems with complex actions over a set of
basic arms, where the decision maker plays a complex action rather than a basic arm in …

Optimal regret analysis of thompson sampling in stochastic multi-armed bandit problem with multiple plays

J Komiyama, J Honda… - … Conference on Machine …, 2015 - proceedings.mlr.press
We discuss a multiple-play multi-armed bandit (MAB) problem in which several arms are
selected at each round. Recently, Thompson sampling (TS), a randomized algorithm with a …

Data poisoning attacks on stochastic bandits

F Liu, N Shroff - International Conference on Machine …, 2019 - proceedings.mlr.press
Stochastic multi-armed bandits form a class of online learning problems that have important
applications in online recommendation systems, adaptive medical treatment, and many …