Revisiting simple regret: Fast rates for returning a good arm

Y Zhao, C Stephens, C Szepesvári… - … on Machine Learning, 2023 - proceedings.mlr.press
Simple regret is a natural and parameter-free performance criterion for pure exploration in
multi-armed bandits yet is less popular than the probability of missing the best arm or an …

Multi-armed bandits with bounded arm-memory: Near-optimal guarantees for best-arm identification and regret minimization

A Maiti, V Patil, A Khan - Advances in Neural Information …, 2021 - proceedings.neurips.cc
Abstract We study the Stochastic Multi-armed Bandit problem under bounded arm-memory.
In this setting, the arms arrive in a stream, and the number of arms that can be stored in the …

Asymptotically optimal quantile pure exploration for infinite-armed bandits

EXY Gong, M Sellke - Advances in Neural Information …, 2023 - proceedings.neurips.cc
We study pure exploration with infinitely many bandit arms generated\iid from an unknown
distribution. Our goal is to efficiently select a single high quality arm whose average reward …

Active ranking of experts based on their performances in many tasks

EM Saad, N Verzelen… - … Conference on Machine …, 2023 - proceedings.mlr.press
We consider the problem of ranking n experts based on their performances on d tasks. We
make a monotonicity assumption stating that for each pair of experts, one outperforms the …

Ac-band: A combinatorial bandit-based approach to algorithm configuration

J Brandt, E Schede, B Haddenhorst, V Bengs… - Proceedings of the …, 2023 - ojs.aaai.org
We study the algorithm configuration (AC) problem, in which one seeks to find an optimal
parameter configuration of a given target algorithm in an automated way. Although this field …

Dynamic learning in large matching markets

A Kalvit, A Zeevi - Advances in Neural Information …, 2022 - proceedings.neurips.cc
We study a sequential matching problem faced by" large" centralized platforms where" jobs"
must be matched to" workers" subject to uncertainty about worker skill proficiencies. Jobs …

Legs: Learning efficient grasp sets for exploratory grasping

L Fu, M Danielczuk, A Balakrishna… - … on Robotics and …, 2022 - ieeexplore.ieee.org
While deep learning has enabled significant progress in designing general purpose robot
grasping systems, there remain objects which still pose challenges for these systems …

Stochastic bandits with groups of similar arms.

F Pesquerel, H Saber… - Advances in Neural …, 2021 - proceedings.neurips.cc
We consider a variant of the stochastic multi-armed bandit problem where arms are known
to be organized into different groups having the same mean. The groups are unknown but a …

Best Arm Identification for Stochastic Rising Bandits

M Mussi, A Montenegro, F Trovó, M Restelli… - arXiv preprint arXiv …, 2023 - arxiv.org
Stochastic Rising Bandits (SRBs) model sequential decision-making problems in which the
expected rewards of the available options increase every time they are selected. This setting …

Asymptotically Optimal Pure Exploration for Infinite-Armed Bandits

XY Gong, M Sellke - arXiv preprint arXiv:2306.01995, 2023 - arxiv.org
We study pure exploration with infinitely many bandit arms generated iid from an unknown
distribution. Our goal is to efficiently select a single high quality arm whose average reward …