Adaptive and sequential experiment design is a well-studied area in numerous domains. We survey and synthesize the work of the online statistical learning paradigm referred to as multi …
An adaptive learning system aims at providing instruction tailored to the current status of a learner, differing from the traditional classroom experience. The latest advances in …
Reinforcement learning (RL) algorithms have been long recognized as powerful tools for optimal sequential decision making. The framework is concerned with a decision maker, the …
The Gittins scheduling policy minimizes the mean response in the single-server M/G/1 queue in a wide variety of settings. Most famously, Gittins is optimal when preemption is …
Multi-armed bandit problems pertain to optimal sequential decision making and learning in unknown environments. Since the first bandit problem posed by Thompson in 1933 for the …
Learning preferences implicit in the choices humans make is a well studied problem in both economics and computer science. However, most work makes the assumption that humans …
Queueing delays are ubiquitous in many domains, including computer systems, service systems, communication networks, supply chains, and transportation. Queueing and …
Restless bandits are a class of sequential resource allocation problems concerned with allocating one or more resources among several alternative processes where the evolution …
A Maiti, V Patil, A Khan - Advances in Neural Information …, 2021 - proceedings.neurips.cc
Abstract We study the Stochastic Multi-armed Bandit problem under bounded arm-memory. In this setting, the arms arrive in a stream, and the number of arms that can be stored in the …