Bandit algorithms: A comprehensive review and their dynamic selection from a portfolio for multicriteria top-k recommendation

A Letard, N Gutowski, O Camp, T Amghar - Expert Systems with …, 2024 - Elsevier
This paper discusses the use of portfolio approaches based on bandit algorithms to optimize
multicriteria decision-making in recommender systems (accuracy and diversity). While …

A framework for adapting offline algorithms to solve combinatorial multi-armed bandit problems with bandit feedback

G Nie, YY Nadew, Y Zhu… - … on Machine Learning, 2023 - proceedings.mlr.press
We investigate the problem of stochastic, combinatorial multi-armed bandits where the
learner only has access to bandit feedback and the reward function can be non-linear. We …

Mitigating exposure bias in online learning to rank recommendation: A novel reward model for cascading bandits

M Mansoury, B Mobasher, H van Hoof - Proceedings of the 33rd ACM …, 2024 - dl.acm.org
Exposure bias is a well-known issue in recommender systems where items and suppliers
are not equally represented in the recommendation results. This bias becomes particularly …

Minimax regret for cascading bandits

D Vial, S Sanghavi, S Shakkottai… - Advances in Neural …, 2022 - proceedings.neurips.cc
Cascading bandits is a natural and popular model that frames the task of learning to rank
from Bernoulli click feedback in a bandit setting. For the case of unstructured rewards, we …

A hybrid bandit framework for diversified recommendation

Q Ding, Y Liu, C Miao, F Cheng, H Tang - Proceedings of the AAAI …, 2021 - ojs.aaai.org
The interactive recommender systems involve users in the recommendation procedure by
receiving timely user feedback to update the recommendation policy. Therefore, they are …

Cascading hybrid bandits: Online learning to rank for relevance and diversity

C Li, H Feng, M Rijke - Proceedings of the 14th ACM Conference on …, 2020 - dl.acm.org
Relevance ranking and result diversification are two core areas in modern recommender
systems. Relevance ranking aims at building a ranked list sorted in decreasing order of item …

On the value of prior in online learning to rank

B Kveton, O Meshi, M Zoghi… - … Conference on Artificial …, 2022 - proceedings.mlr.press
This paper addresses the cold-start problem in online learning to rank (OLTR). We show
both theoretically and empirically that priors improve the quality of ranked lists presented to …

Submodular bandit problem under multiple constraints

S Takemori, M Sato, T Sonoda… - … on Uncertainty in …, 2020 - proceedings.mlr.press
The linear submodular bandit problemwas proposedto simultaneously address diversified
retrieval and online learning in a recommender system. If there is no uncertainty, this …

Learning to make decisions via submodular regularization

A Alieva, A Aceves, J Song, S Mayo, Y Yue… - … Conference on Learning …, 2020 - par.nsf.gov
Many sequential decision making tasks can be viewed as combinatorial optimiza-tion
problems over a large number of actions. When the cost of evaluating an ac-tion is high …

Context uncertainty in contextual bandits with applications to recommender systems

H Wang, Y Ma, H Ding, Y Wang - … of the AAAI Conference on Artificial …, 2022 - ojs.aaai.org
Recurrent neural networks have proven effective in modeling sequential user feedbacks for
recommender systems. However, they usually focus solely on item relevance and fail to …