Bandits with many optimal arms

Y Zhao, C Stephens, C Szepesvári… - … on Machine Learning, 2023 - proceedings.mlr.press

Simple regret is a natural and parameter-free performance criterion for pure exploration in
multi-armed bandits yet is less popular than the probability of missing the best arm or an …

被引用次数：14 相关文章所有 6 个版本

[PDF] neurips.cc

Multi-armed bandits with bounded arm-memory: Near-optimal guarantees for best-arm identification and regret minimization

A Maiti, V Patil, A Khan - Advances in Neural Information …, 2021 - proceedings.neurips.cc

Abstract We study the Stochastic Multi-armed Bandit problem under bounded arm-memory.
In this setting, the arms arrive in a stream, and the number of arms that can be stored in the …

被引用次数：16 相关文章所有 7 个版本

[PDF] neurips.cc

Asymptotically optimal quantile pure exploration for infinite-armed bandits

EXY Gong, M Sellke - Advances in Neural Information …, 2023 - proceedings.neurips.cc

We study pure exploration with infinitely many bandit arms generated\iid from an unknown
distribution. Our goal is to efficiently select a single high quality arm whose average reward …

被引用次数：1 相关文章所有 3 个版本

[PDF] mlr.press

Active ranking of experts based on their performances in many tasks

EM Saad, N Verzelen… - … Conference on Machine …, 2023 - proceedings.mlr.press

We consider the problem of ranking n experts based on their performances on d tasks. We
make a monotonicity assumption stating that for each pair of experts, one outperforms the …

被引用次数：2 相关文章所有 7 个版本

[PDF] aaai.org

Ac-band: A combinatorial bandit-based approach to algorithm configuration

J Brandt, E Schede, B Haddenhorst, V Bengs… - Proceedings of the …, 2023 - ojs.aaai.org

We study the algorithm configuration (AC) problem, in which one seeks to find an optimal
parameter configuration of a given target algorithm in an automated way. Although this field …

被引用次数：5 相关文章所有 5 个版本

[PDF] neurips.cc

Dynamic learning in large matching markets

A Kalvit, A Zeevi - Advances in Neural Information …, 2022 - proceedings.neurips.cc

We study a sequential matching problem faced by" large" centralized platforms where" jobs"
must be matched to" workers" subject to uncertainty about worker skill proficiencies. Jobs …

被引用次数：5 相关文章所有 8 个版本

[PDF] arxiv.org

Legs: Learning efficient grasp sets for exploratory grasping

L Fu, M Danielczuk, A Balakrishna… - … on Robotics and …, 2022 - ieeexplore.ieee.org

While deep learning has enabled significant progress in designing general purpose robot
grasping systems, there remain objects which still pose challenges for these systems …

被引用次数：9 相关文章所有 4 个版本

[PDF] neurips.cc

Stochastic bandits with groups of similar arms.

F Pesquerel, H Saber… - Advances in Neural …, 2021 - proceedings.neurips.cc

We consider a variant of the stochastic multi-armed bandit problem where arms are known
to be organized into different groups having the same mean. The groups are unknown but a …

被引用次数：8 相关文章所有 11 个版本

[PDF] arxiv.org

Best Arm Identification for Stochastic Rising Bandits

M Mussi, A Montenegro, F Trovó, M Restelli… - arXiv preprint arXiv …, 2023 - arxiv.org

Stochastic Rising Bandits (SRBs) model sequential decision-making problems in which the
expected rewards of the available options increase every time they are selected. This setting …

被引用次数：6 相关文章所有 6 个版本

[PDF] arxiv.org

Asymptotically Optimal Pure Exploration for Infinite-Armed Bandits

XY Gong, M Sellke - arXiv preprint arXiv:2306.01995, 2023 - arxiv.org

We study pure exploration with infinitely many bandit arms generated iid from an unknown
distribution. Our goal is to efficiently select a single high quality arm whose average reward …

被引用次数：2 相关文章所有 2 个版本

高级搜索

QQ 群

Revisiting simple regret: Fast rates for returning a good arm

Multi-armed bandits with bounded arm-memory: Near-optimal guarantees for best-arm identification and regret minimization

Asymptotically optimal quantile pure exploration for infinite-armed bandits

Active ranking of experts based on their performances in many tasks

Ac-band: A combinatorial bandit-based approach to algorithm configuration

Dynamic learning in large matching markets

Legs: Learning efficient grasp sets for exploratory grasping

Stochastic bandits with groups of similar arms.

Best Arm Identification for Stochastic Rising Bandits

Asymptotically Optimal Pure Exploration for Infinite-Armed Bandits

引用