Tight Policy Regret Bounds for Improving and Decaying Bandits.

N Levine, K Crammer… - Advances in neural …, 2017 - proceedings.neurips.cc

Abstract The Multi-Armed Bandits (MAB) framework highlights the trade-off between
acquiring new knowledge (Exploration) and leveraging available knowledge (Exploitation) …

被引用次数：140 相关文章所有 6 个版本

[PDF] mlr.press

Stochastic rising bandits

AM Metelli, F Trovo, M Pirola… - … Conference on Machine …, 2022 - proceedings.mlr.press

This paper is in the field of stochastic Multi-Armed Bandits (MABs), ie, those sequential
selection techniques able to learn online using only the feedback given by the chosen …

被引用次数：25 相关文章所有 7 个版本

[PDF] mlr.press

Stochastic bandits with delay-dependent payoffs

L Cella, N Cesa-Bianchi - International Conference on …, 2020 - proceedings.mlr.press

Motivated by recommendation problems in music streaming platforms, we propose a
nonstationary stochastic bandit model in which the expected reward of an arm depends on …

被引用次数：53 相关文章所有 10 个版本

[PDF] mlr.press

Rotting bandits are no harder than stochastic ones

J Seznec, A Locatelli, A Carpentier… - The 22nd …, 2019 - proceedings.mlr.press

In stochastic multi-armed bandits, the reward distribution of each arm is assumed to be
stationary. This assumption is often violated in practice (eg, in recommendation systems) …

被引用次数：67 相关文章所有 10 个版本

[PDF] aaai.org

Efficient automatic CASH via rising bandits

Y Li, J Jiang, J Gao, Y Shao, C Zhang, B Cui - Proceedings of the AAAI …, 2020 - aaai.org

Abstract The Combined Algorithm Selection and Hyperparameter optimization (CASH) is
one of the most fundamental problems in Automatic Machine Learning (AutoML). The …

被引用次数：48 相关文章所有 7 个版本

[PDF] neurips.cc

Recovering bandits

C Pike-Burke, S Grunewalder - Advances in Neural …, 2019 - proceedings.neurips.cc

We study the recovering bandits problem, a variant of the stochastic multi-armed bandit
problem where the expected reward of each arm varies according to some unknown …

被引用次数：50 相关文章所有 10 个版本

[PDF] neurips.cc

Fighting boredom in recommender systems with linear reinforcement learning

R Warlop, A Lazaric, J Mary - Advances in Neural …, 2018 - proceedings.neurips.cc

A common assumption in recommender systems (RS) is the existence of a best fixed
recommendation strategy. Such strategy may be simple and work at the item level (eg, in …

被引用次数：54 相关文章所有 8 个版本

[PDF] neurips.cc

Rebounding bandits for modeling satiation effects

L Leqi, F Kilinc Karzan, Z Lipton… - Advances in Neural …, 2021 - proceedings.neurips.cc

Psychological research shows that enjoyment of many goods is subject to satiation, with
short-term satisfaction declining after repeated exposures to the same item. Nevertheless …

被引用次数：30 相关文章所有 6 个版本

[PDF] mlr.press

A single algorithm for both restless and rested rotting bandits

J Seznec, P Menard, A Lazaric… - … Conference on Artificial …, 2020 - proceedings.mlr.press

In many application domains (eg, recommender systems, intelligent tutoring systems), the
rewards associated to the available actions tend to decrease over time. This decay is either …

被引用次数：39 相关文章所有 8 个版本

[PDF] researchgate.net

Analysing the impact of travel information for minimising the regret of route choice

GO Ramos, ALC Bazzan, BC da Silva - Transportation Research Part C …, 2018 - Elsevier

In the route choice problem, self-interested drivers aim at choosing routes that minimise
travel costs between their origins and destinations. We model this problem as a multiagent …

被引用次数：51 相关文章所有 5 个版本

高级搜索

QQ 群