On Bayesian upper confidence bounds for bandit problems

SCH Hoi, D Sahoo, J Lu, P Zhao - Neurocomputing, 2021 - Elsevier

Online learning represents a family of machine learning methods, where a learner attempts
to tackle some predictive (or any type of decision-making) task by learning from a sequence …

被引用次数：880 相关文章所有 6 个版本

[PDF] nowpublishers.com

A tutorial on thompson sampling

DJ Russo, B Van Roy, A Kazerouni… - … and Trends® in …, 2018 - nowpublishers.com

Thompson sampling is an algorithm for online decision problems where actions are taken
sequentially in a manner that must balance between exploiting what is known to maximize …

被引用次数：1288 相关文章所有 34 个版本

[PDF] tor-lattimore.com

[图书][B] Bandit algorithms

T Lattimore, C Szepesvári - 2020 - books.google.com

Decision-making in the face of uncertainty is a significant challenge in machine learning,
and the multi-armed bandit model is a commonly used framework to address it. This …

被引用次数：3253 相关文章所有 9 个版本

[PDF] ucl.ac.uk

Learning to reinforcement learn

JX Wang, Z Kurth-Nelson, D Tirumala, H Soyer… - arXiv preprint arXiv …, 2016 - arxiv.org

In recent years deep reinforcement learning (RL) systems have attained superhuman
performance in a number of challenging task domains. However, a major limitation of such …

被引用次数：1115 相关文章所有 8 个版本

[PDF] ieee.org

Taking the human out of the loop: A review of Bayesian optimization

B Shahriari, K Swersky, Z Wang… - Proceedings of the …, 2015 - ieeexplore.ieee.org

Big Data applications are typically associated with systems involving large numbers of
users, massive complex software systems, and large-scale heterogeneous computing and …

被引用次数：5973 相关文章所有 14 个版本

[PDF] sciencedirect.com

The empirical status of predictive coding and active inference

R Hodson, M Mehta, R Smith - Neuroscience & Biobehavioral Reviews, 2024 - Elsevier

Research on predictive processing models has focused largely on two specific algorithmic
theories: Predictive Coding for perception and Active Inference for decision-making. While …

被引用次数：28 相关文章所有 4 个版本

[PDF] nowpublishers.com

Bayesian reinforcement learning: A survey

M Ghavamzadeh, S Mannor, J Pineau… - … and Trends® in …, 2015 - nowpublishers.com

Bayesian methods for machine learning have been widely investigated, yielding principled
methods for incorporating prior information into inference algorithms. In this survey, we …

被引用次数：589 相关文章所有 11 个版本

[PDF] nowpublishers.com

Regret analysis of stochastic and nonstochastic multi-armed bandit problems

S Bubeck, N Cesa-Bianchi - Foundations and Trends® in …, 2012 - nowpublishers.com

Multi-armed bandit problems are the most basic examples of sequential decision problems
with an exploration-exploitation trade-off. This is the balance between staying with the option …

被引用次数：3274 相关文章所有 26 个版本

[PDF] jmlr.org

[PDF][PDF] On the complexity of best-arm identification in multi-armed bandit models

E Kaufmann, O Cappé, A Garivier - The Journal of Machine Learning …, 2016 - jmlr.org

The stochastic multi-armed bandit model is a simple abstraction that has proven useful in
many different contexts in statistics and machine learning. Whereas the achievable limit in …

被引用次数：647 相关文章所有 14 个版本

[PDF] mlr.press

Analysis of thompson sampling for the multi-armed bandit problem

S Agrawal, N Goyal - Conference on learning theory, 2012 - proceedings.mlr.press

The multi-armed bandit problem is a popular model for studying exploration/exploitation
trade-off in sequential decision problems. Many algorithms are now available for this well …

被引用次数：1647 相关文章所有 14 个版本

高级搜索

QQ 群