Rotting bandits

N Levine, K Crammer… - Advances in neural …, 2017 - proceedings.neurips.cc
Abstract The Multi-Armed Bandits (MAB) framework highlights the trade-off between
acquiring new knowledge (Exploration) and leveraging available knowledge (Exploitation) …

Stochastic rising bandits

AM Metelli, F Trovo, M Pirola… - … Conference on Machine …, 2022 - proceedings.mlr.press
This paper is in the field of stochastic Multi-Armed Bandits (MABs), ie, those sequential
selection techniques able to learn online using only the feedback given by the chosen …

Stochastic bandits with delay-dependent payoffs

L Cella, N Cesa-Bianchi - International Conference on …, 2020 - proceedings.mlr.press
Motivated by recommendation problems in music streaming platforms, we propose a
nonstationary stochastic bandit model in which the expected reward of an arm depends on …

Rotting bandits are no harder than stochastic ones

J Seznec, A Locatelli, A Carpentier… - The 22nd …, 2019 - proceedings.mlr.press
In stochastic multi-armed bandits, the reward distribution of each arm is assumed to be
stationary. This assumption is often violated in practice (eg, in recommendation systems) …

Efficient automatic CASH via rising bandits

Y Li, J Jiang, J Gao, Y Shao, C Zhang, B Cui - Proceedings of the AAAI …, 2020 - aaai.org
Abstract The Combined Algorithm Selection and Hyperparameter optimization (CASH) is
one of the most fundamental problems in Automatic Machine Learning (AutoML). The …

Recovering bandits

C Pike-Burke, S Grunewalder - Advances in Neural …, 2019 - proceedings.neurips.cc
We study the recovering bandits problem, a variant of the stochastic multi-armed bandit
problem where the expected reward of each arm varies according to some unknown …

Fighting boredom in recommender systems with linear reinforcement learning

R Warlop, A Lazaric, J Mary - Advances in Neural …, 2018 - proceedings.neurips.cc
A common assumption in recommender systems (RS) is the existence of a best fixed
recommendation strategy. Such strategy may be simple and work at the item level (eg, in …

Rebounding bandits for modeling satiation effects

L Leqi, F Kilinc Karzan, Z Lipton… - Advances in Neural …, 2021 - proceedings.neurips.cc
Psychological research shows that enjoyment of many goods is subject to satiation, with
short-term satisfaction declining after repeated exposures to the same item. Nevertheless …

A single algorithm for both restless and rested rotting bandits

J Seznec, P Menard, A Lazaric… - … Conference on Artificial …, 2020 - proceedings.mlr.press
In many application domains (eg, recommender systems, intelligent tutoring systems), the
rewards associated to the available actions tend to decrease over time. This decay is either …

Analysing the impact of travel information for minimising the regret of route choice

GO Ramos, ALC Bazzan, BC da Silva - Transportation Research Part C …, 2018 - Elsevier
In the route choice problem, self-interested drivers aim at choosing routes that minimise
travel costs between their origins and destinations. We model this problem as a multiagent …