[图书][B] Bandit algorithms

T Lattimore, C Szepesvári - 2020 - books.google.com
Decision-making in the face of uncertainty is a significant challenge in machine learning,
and the multi-armed bandit model is a commonly used framework to address it. This …

" Deep reinforcement learning for search, recommendation, and online advertising: a survey" by Xiangyu Zhao, Long Xia, Jiliang Tang, and Dawei Yin with Martin …

X Zhao, L Xia, J Tang, D Yin - ACM sigweb newsletter, 2019 - dl.acm.org
Search, recommendation, and online advertising are the three most important information-
providing mechanisms on the web. These information seeking techniques, satisfying users' …

Hierarchical bayesian bandits

J Hong, B Kveton, M Zaheer… - International …, 2022 - proceedings.mlr.press
Abstract Meta-, multi-task, and federated learning can be all viewed as solving similar tasks,
drawn from a distribution that reflects task similarities. We provide a unified view of all these …

Cascading bandits for large-scale recommendation problems

S Zong, H Ni, K Sung, NR Ke, Z Wen… - arXiv preprint arXiv …, 2016 - arxiv.org
Most recommender systems recommend a list of items. The user examines the list, from the
first item to the last, and often chooses the first attractive item and does not examine the rest …

Carousel personalization in music streaming apps with contextual bandits

W Bendada, G Salha, T Bontempelli - … of the 14th ACM Conference on …, 2020 - dl.acm.org
Media services providers, such as music streaming platforms, frequently leverage swipeable
carousels to recommend personalized content to their users. However, selecting the most …

A visual dialog augmented interactive recommender system

T Yu, Y Shen, H Jin - Proceedings of the 25th ACM SIGKDD international …, 2019 - dl.acm.org
Traditional recommender systems rely on user feedback such as ratings or clicks to the
items, to analyze the user interest and provide personalized recommendations. However …

Unbiased learning to rank: online or offline?

Q Ai, T Yang, H Wang, J Mao - ACM Transactions on Information …, 2021 - dl.acm.org
How to obtain an unbiased ranking model by learning to rank with biased user feedback is
an important research question for IR. Existing work on unbiased learning to rank (ULTR) …

Online learning to rank in stochastic click models

M Zoghi, T Tunys, M Ghavamzadeh… - International …, 2017 - proceedings.mlr.press
Online learning to rank is a core problem in information retrieval and machine learning.
Many provably efficient algorithms have been recently proposed for this problem in specific …

Multiple-play bandits in the position-based model

P Lagrée, C Vernade, O Cappe - Advances in Neural …, 2016 - proceedings.neurips.cc
Sequentially learning to place items in multi-position displays or lists is a task that can be
cast into the multiple-play semi-bandit setting. However, a major concern in this context is …

Stochastic bandits with delay-dependent payoffs

L Cella, N Cesa-Bianchi - International Conference on …, 2020 - proceedings.mlr.press
Motivated by recommendation problems in music streaming platforms, we propose a
nonstationary stochastic bandit model in which the expected reward of an arm depends on …