Action elimination and stopping conditions for the multi-armed bandit and reinforcement learning...

Lake water temperature modeling in an era of climate change: Data sources, models, and future prospects

S Piccolroaz, S Zhu, R Ladwig, L Carrea… - Reviews of …, 2024 - Wiley Online Library

Lake thermal dynamics have been considerably impacted by climate change, with potential
adverse effects on aquatic ecosystems. To better understand the potential impacts of future …

被引用次数：16 相关文章所有 8 个版本

[PDF] springer.com

Review on ranking and selection: A new perspective

LJ Hong, W Fan, J Luo - Frontiers of Engineering Management, 2021 - Springer

In this paper, we briefly review the development of ranking and selection (R&S) in the past
70 years, especially the theoretical achievements and practical applications in the past 20 …

被引用次数：107 相关文章所有 12 个版本

[PDF] tor-lattimore.com

[图书][B] Bandit algorithms

T Lattimore, C Szepesvári - 2020 - books.google.com

Decision-making in the face of uncertainty is a significant challenge in machine learning,
and the multi-armed bandit model is a commonly used framework to address it. This …

被引用次数：2999 相关文章所有 9 个版本

[PDF] nowpublishers.com

Introduction to multi-armed bandits

A Slivkins - Foundations and Trends® in Machine Learning, 2019 - nowpublishers.com

Multi-armed bandits a simple but very powerful framework for algorithms that make
decisions over time under uncertainty. An enormous body of work has accumulated over the …

被引用次数：1113 相关文章所有 7 个版本

[PDF] jmlr.org

Hyperband: A novel bandit-based approach to hyperparameter optimization

L Li, K Jamieson, G DeSalvo, A Rostamizadeh… - Journal of Machine …, 2018 - jmlr.org

Performance of machine learning algorithms depends critically on identifying a good set of
hyperparameters. While recent approaches use Bayesian optimization to adaptively select …

被引用次数：2903 相关文章所有 13 个版本

[PDF] mlr.press

Sample-optimal parametric q-learning using linearly additive features

L Yang, M Wang - International conference on machine …, 2019 - proceedings.mlr.press

Consider a Markov decision process (MDP) that admits a set of state-action features, which
can linearly express the process's probabilistic transition model. We propose a parametric Q …

被引用次数：348 相关文章所有 9 个版本

[PDF] mlr.press

Tighter problem-dependent regret bounds in reinforcement learning without domain knowledge using value function bounds

A Zanette, E Brunskill - International Conference on Machine …, 2019 - proceedings.mlr.press

Strong worst-case performance bounds for episodic reinforcement learning exist but
fortunately in practice RL algorithms perform much better than such bounds would predict …

被引用次数：306 相关文章所有 8 个版本

[PDF] mlr.press

Non-stochastic best arm identification and hyperparameter optimization

K Jamieson, A Talwalkar - Artificial intelligence and statistics, 2016 - proceedings.mlr.press

Motivated by the task of hyperparameter optimization, we introduce the\em non-stochastic
best-arm identification problem. We identify an attractive algorithm for this setting that makes …

被引用次数：727 相关文章所有 8 个版本

[PDF] nowpublishers.com

Regret analysis of stochastic and nonstochastic multi-armed bandit problems

S Bubeck, N Cesa-Bianchi - Foundations and Trends® in …, 2012 - nowpublishers.com

Multi-armed bandit problems are the most basic examples of sequential decision problems
with an exploration-exploitation trade-off. This is the balance between staying with the option …

被引用次数：3184 相关文章所有 26 个版本

[PDF] jmlr.org

[PDF][PDF] On the complexity of best-arm identification in multi-armed bandit models

E Kaufmann, O Cappé, A Garivier - The Journal of Machine Learning …, 2016 - jmlr.org

The stochastic multi-armed bandit model is a simple abstraction that has proven useful in
many different contexts in statistics and machine learning. Whereas the achievable limit in …

被引用次数：626 相关文章所有 14 个版本

高级搜索

QQ 群