[图书][B] Bandit algorithms

T Lattimore, C Szepesvári - 2020 - books.google.com
Decision-making in the face of uncertainty is a significant challenge in machine learning,
and the multi-armed bandit model is a commonly used framework to address it. This …

A survey of online experiment design with the stochastic multi-armed bandit

G Burtini, J Loeppky, R Lawrence - arXiv preprint arXiv:1510.00757, 2015 - arxiv.org
Adaptive and sequential experiment design is a well-studied area in numerous domains. We
survey and synthesize the work of the online statistical learning paradigm referred to as multi …

Recommendation system for adaptive learning

Y Chen, X Li, J Liu, Z Ying - Applied psychological …, 2018 - journals.sagepub.com
An adaptive learning system aims at providing instruction tailored to the current status of a
learner, differing from the traditional classroom experience. The latest advances in …

Reinforcement learning for sequential decision making in population research

N Deliu - Quality & Quantity, 2024 - Springer
Reinforcement learning (RL) algorithms have been long recognized as powerful tools for
optimal sequential decision making. The framework is concerned with a decision maker, the …

The Gittins policy is nearly optimal in the M/G/k under extremely general conditions

Z Scully, I Grosof, M Harchol-Balter - … of the ACM on Measurement and …, 2020 - dl.acm.org
The Gittins scheduling policy minimizes the mean response in the single-server M/G/1
queue in a wide variety of settings. Most famously, Gittins is optimal when preemption is …

[图书][B] Multi-armed bandits: Theory and applications to online learning in networks

Q Zhao - 2019 - books.google.com
Multi-armed bandit problems pertain to optimal sequential decision making and learning in
unknown environments. Since the first bandit problem posed by Thompson in 1933 for the …

The assistive multi-armed bandit

L Chan, D Hadfield-Menell, S Srinivasa… - 2019 14th ACM/IEEE …, 2019 - ieeexplore.ieee.org
Learning preferences implicit in the choices humans make is a well studied problem in both
economics and computer science. However, most work makes the assumption that humans …

A new toolbox for scheduling theory

Z Scully - ACM SIGMETRICS Performance Evaluation Review, 2023 - dl.acm.org
Queueing delays are ubiquitous in many domains, including computer systems, service
systems, communication networks, supply chains, and transportation. Queueing and …

Conditions for indexability of restless bandits and an algorithm to compute Whittle index

N Akbarzadeh, A Mahajan - Advances in Applied Probability, 2022 - cambridge.org
Restless bandits are a class of sequential resource allocation problems concerned with
allocating one or more resources among several alternative processes where the evolution …

Multi-armed bandits with bounded arm-memory: Near-optimal guarantees for best-arm identification and regret minimization

A Maiti, V Patil, A Khan - Advances in Neural Information …, 2021 - proceedings.neurips.cc
Abstract We study the Stochastic Multi-armed Bandit problem under bounded arm-memory.
In this setting, the arms arrive in a stream, and the number of arms that can be stored in the …