Preference-based online learning with dueling bandits: A survey

V Bengs, R Busa-Fekete, A El Mesaoudi-Paul… - Journal of Machine …, 2021 - jmlr.org
In machine learning, the notion of multi-armed bandits refers to a class of online learning
problems, in which an agent is supposed to simultaneously explore and exploit a given set …

Human preferences as dueling bandits

X Yan, C Luo, CLA Clarke, N Craswell… - Proceedings of the 45th …, 2022 - dl.acm.org
The dramatic improvements in core information retrieval tasks engendered by neural
rankers create a need for novel evaluation methods. If every ranker returns highly relevant …

Cascading hybrid bandits: Online learning to rank for relevance and diversity

C Li, H Feng, M Rijke - Proceedings of the 14th ACM Conference on …, 2020 - dl.acm.org
Relevance ranking and result diversification are two core areas in modern recommender
systems. Relevance ranking aims at building a ranked list sorted in decreasing order of item …

Preference-based offline evaluation

CLA Clarke, F Diaz, N Arabzadeh - … on Web Search and Data Mining, 2023 - dl.acm.org
A core step in production model research and development involves the offline evaluation of
a system before production deployment. Traditional offline evaluation of search …

An asymptotically optimal batched algorithm for the dueling bandit problem

A Agarwal, R Ghuge - Advances in Neural Information …, 2022 - proceedings.neurips.cc
We study the $ K $-armed dueling bandit problem, a variation of the traditional multi-armed
bandit problem in which feedback is obtained in the form of pairwise comparisons. Previous …

[HTML][HTML] Reinforcement online learning to rank with unbiased reward shaping

S Zhuang, Z Qiao, G Zuccon - Information Retrieval Journal, 2022 - Springer
Online learning to rank (OLTR) aims to learn a ranker directly from implicit feedback derived
from users' interactions, such as clicks. Clicks however are a biased signal: specifically, top …

Non-stationary dueling bandits

P Kolpaczki, V Bengs, E Hüllermeier - arXiv preprint arXiv:2202.00935, 2022 - arxiv.org
We study the non-stationary dueling bandits problem with $ K $ arms, where the time
horizon $ T $ consists of $ M $ stationary segments, each of which is associated with its own …

Testification of condorcet winners in dueling bandits

B Haddenhorst, V Bengs, J Brandt… - Uncertainty in …, 2021 - proceedings.mlr.press
Several algorithms for finding the best arm in the dueling bandits setting assume the
existence of a Condorcet winner (CW), that is, an arm that uniformly dominates all other …

Direct Preference-Based Evolutionary Multi-Objective Optimization with Dueling Bandit

T Huang, K Li - arXiv preprint arXiv:2311.14003, 2023 - arxiv.org
Optimization problems find widespread use in both single-objective and multi-objective
scenarios. In practical applications, users aspire for solutions that converge to the region of …

The Power of Adaptivity for Decision-Making under Uncertainty

R Ghuge - 2023 - deepblue.lib.umich.edu
In this thesis, we study the role of adaptivity in decision-making problems under uncertainty.
The first part of the thesis focuses on combinatorial problems, while the second part of the …