MergeDTS: A method for effective large-scale online ranker evaluation

V Bengs, R Busa-Fekete, A El Mesaoudi-Paul… - Journal of Machine …, 2021 - jmlr.org

In machine learning, the notion of multi-armed bandits refers to a class of online learning
problems, in which an agent is supposed to simultaneously explore and exploit a given set …

被引用次数：99 相关文章所有 7 个版本

[PDF] arxiv.org

Human preferences as dueling bandits

X Yan, C Luo, CLA Clarke, N Craswell… - Proceedings of the 45th …, 2022 - dl.acm.org

The dramatic improvements in core information retrieval tasks engendered by neural
rankers create a need for novel evaluation methods. If every ranker returns highly relevant …

被引用次数：14 相关文章所有 7 个版本

[PDF] arxiv.org

Cascading hybrid bandits: Online learning to rank for relevance and diversity

C Li, H Feng, M Rijke - Proceedings of the 14th ACM Conference on …, 2020 - dl.acm.org

Relevance ranking and result diversification are two core areas in modern recommender
systems. Relevance ranking aims at building a ranked list sorted in decreasing order of item …

被引用次数：35 相关文章所有 6 个版本

Preference-based offline evaluation

CLA Clarke, F Diaz, N Arabzadeh - … on Web Search and Data Mining, 2023 - dl.acm.org

A core step in production model research and development involves the offline evaluation of
a system before production deployment. Traditional offline evaluation of search …

被引用次数：5 相关文章

[PDF] neurips.cc

An asymptotically optimal batched algorithm for the dueling bandit problem

A Agarwal, R Ghuge - Advances in Neural Information …, 2022 - proceedings.neurips.cc

We study the $ K $-armed dueling bandit problem, a variation of the traditional multi-armed
bandit problem in which feedback is obtained in the form of pairwise comparisons. Previous …

被引用次数：2 相关文章所有 7 个版本

[HTML] springer.com

[HTML][HTML] Reinforcement online learning to rank with unbiased reward shaping

S Zhuang, Z Qiao, G Zuccon - Information Retrieval Journal, 2022 - Springer

Online learning to rank (OLTR) aims to learn a ranker directly from implicit feedback derived
from users' interactions, such as clicks. Clicks however are a biased signal: specifically, top …

被引用次数：7 相关文章所有 8 个版本

[PDF] arxiv.org

Non-stationary dueling bandits

P Kolpaczki, V Bengs, E Hüllermeier - arXiv preprint arXiv:2202.00935, 2022 - arxiv.org

We study the non-stationary dueling bandits problem with $ K $ arms, where the time
horizon $ T $ consists of $ M $ stationary segments, each of which is associated with its own …

被引用次数：5 相关文章所有 2 个版本

[PDF] mlr.press

Testification of condorcet winners in dueling bandits

B Haddenhorst, V Bengs, J Brandt… - Uncertainty in …, 2021 - proceedings.mlr.press

Several algorithms for finding the best arm in the dueling bandits setting assume the
existence of a Condorcet winner (CW), that is, an arm that uniformly dominates all other …

被引用次数：4 相关文章所有 7 个版本

[PDF] arxiv.org

Direct Preference-Based Evolutionary Multi-Objective Optimization with Dueling Bandit

T Huang, K Li - arXiv preprint arXiv:2311.14003, 2023 - arxiv.org

Optimization problems find widespread use in both single-objective and multi-objective
scenarios. In practical applications, users aspire for solutions that converge to the region of …

The Power of Adaptivity for Decision-Making under Uncertainty

R Ghuge - 2023 - deepblue.lib.umich.edu

In this thesis, we study the role of adaptivity in decision-making problems under uncertainty.
The first part of the thesis focuses on combinatorial problems, while the second part of the …

高级搜索

QQ 群