Thompson sampling for 1-dimensional exponential family bandits

T Lattimore, C Szepesvári - 2020 - books.google.com

Decision-making in the face of uncertainty is a significant challenge in machine learning,
and the multi-armed bandit model is a commonly used framework to address it. This …

被引用次数：3235 相关文章所有 9 个版本

[PDF] arxiv.org

Learning to optimize via posterior sampling

D Russo, B Van Roy - Mathematics of Operations Research, 2014 - pubsonline.informs.org

This paper considers the use of a simple posterior sampling algorithm to balance between
exploration and exploitation when learning to optimize actions such as in multiarmed bandit …

被引用次数：797 相关文章所有 17 个版本

[PDF] mlr.press

Near-optimal regret bounds for thompson sampling

S Agrawal, N Goyal - Journal of the ACM (JACM), 2017 - dl.acm.org

Thompson Sampling (TS) is one of the oldest heuristics for multiarmed bandit problems. It is
a randomized algorithm based on Bayesian ideas and has recently generated significant …

被引用次数：709 相关文章所有 14 个版本

[PDF] mlr.press

Simple bayesian algorithms for best arm identification

D Russo - Conference on Learning Theory, 2016 - proceedings.mlr.press

This paper considers the optimal adaptive allocation of measurement effort for identifying the
best among a finite set of options or designs. An experimenter sequentially chooses designs …

被引用次数：344 相关文章所有 10 个版本

[PDF] jmlr.org

An information-theoretic analysis of thompson sampling

D Russo, B Van Roy - Journal of Machine Learning Research, 2016 - jmlr.org

We provide an information-theoretic analysis of Thompson sampling that applies across a
broad range of online optimization problems in which a decision-maker must learn from …

被引用次数：456 相关文章所有 11 个版本

[PDF] mlr.press

Linear thompson sampling revisited

M Abeille, A Lazaric - Artificial Intelligence and Statistics, 2017 - proceedings.mlr.press

We derive an alternative proof for the regret of Thompson sampling (TS) in the stochastic
linear bandit setting. While we obtain a regret bound of order $ O (d^ 3/2\sqrtT) $ as in …

被引用次数：299 相关文章所有 17 个版本

[PDF] nowpublishers.com

From bandits to monte-carlo tree search: The optimistic principle applied to optimization and planning

R Munos - Foundations and Trends® in Machine Learning, 2014 - nowpublishers.com

This work covers several aspects of the optimism in the face of uncertainty principle applied
to large scale optimization problems under finite numerical budget. The initial motivation for …

被引用次数：314 相关文章所有 18 个版本

[PDF] mlr.press

Thompson sampling for complex online problems

A Gopalan, S Mannor… - … conference on machine …, 2014 - proceedings.mlr.press

We consider stochastic multi-armed bandit problems with complex actions over a set of
basic arms, where the decision maker plays a complex action rather than a basic arm in …

被引用次数：251 相关文章所有 7 个版本

[PDF] mlr.press

Optimal regret analysis of thompson sampling in stochastic multi-armed bandit problem with multiple plays

J Komiyama, J Honda… - … Conference on Machine …, 2015 - proceedings.mlr.press

We discuss a multiple-play multi-armed bandit (MAB) problem in which several arms are
selected at each round. Recently, Thompson sampling (TS), a randomized algorithm with a …

被引用次数：195 相关文章所有 5 个版本

[PDF] mlr.press

Data poisoning attacks on stochastic bandits

F Liu, N Shroff - International Conference on Machine …, 2019 - proceedings.mlr.press

Stochastic multi-armed bandits form a class of online learning problems that have important
applications in online recommendation systems, adaptive medical treatment, and many …

被引用次数：122 相关文章所有 9 个版本

高级搜索

QQ 群