Learning from delayed outcomes via proxies with applications to recommender systems

MD Ekstrand, A Das, R Burke… - Foundations and Trends …, 2022 - nowpublishers.com

Recommendation, information retrieval, and other information access systems pose unique
challenges for investigating and applying the fairness and non-discrimination concepts that …

被引用次数：167 相关文章所有 13 个版本

[PDF] mlr.press

Adapting to delays and data in adversarial multi-armed bandits

A Gyorgy, P Joulani - International Conference on Machine …, 2021 - proceedings.mlr.press

We consider the adversarial multi-armed bandit problem under delayed feedback. We
analyze variants of the Exp3 algorithm that tune their step size using only information (about …

被引用次数：32 相关文章所有 5 个版本

[PDF] academia.edu

[PDF][PDF] Fairness and discrimination in information access systems

MD Ekstrand, A Das, R Burke, F Diaz - arXiv preprint arXiv …, 2021 - academia.edu

Recommendation, information retrieval, and other information access systems pose unique
challenges for investigating and applying the fairness and non-discrimination concepts that …

被引用次数：26 相关文章

[PDF] mlr.press

Non-stationary delayed bandits with intermediate observations

C Vernade, A Gyorgy, T Mann - International Conference on …, 2020 - proceedings.mlr.press

Online recommender systems often face long delays in receiving feedback, especially when
optimizing for some long-term metrics. While mitigating the effects of delays in learning is …

被引用次数：22 相关文章所有 5 个版本

Reinforcement learning: a friendly introduction

D Daoun, F Ibnat, Z Alom, Z Aung, MA Azim - The International Conference …, 2021 - Springer

Reinforcement Learning (RL) is a branch of machine learning (ML) that is used to train
artificial intelligence (AI) systems and find the optimal solution for problems. This tutorial …

被引用次数：11 相关文章所有 2 个版本

[PDF] arxiv.org

Rethinking Knowledge Transfer in Learning Using Privileged Information

D Provodin, B Akker, C Katsimerou, M Kaptein… - arXiv preprint arXiv …, 2024 - arxiv.org

In supervised machine learning, privileged information (PI) is information that is unavailable
at inference, but is accessible during training time. Research on learning using privileged …

Nonstochastic bandits with composite anonymous feedback

N Cesa-Bianchi, T Cesari, R Colomboni… - Journal of Machine …, 2022 - jmlr.org

We investigate a nonstochastic bandit setting in which the loss of an action is not
immediately charged to the player, but rather spread over the subsequent rounds in an …

被引用次数：8 相关文章所有 10 个版本

[PDF] arxiv.org

Delayed bandits: when do intermediate observations help?

E Esposito, S Masoudian, H Qiu… - arXiv preprint arXiv …, 2023 - arxiv.org

We study a $ K $-armed bandit with delayed feedback and intermediate observations. We
consider a model where intermediate observations have a form of a finite state, which is …

被引用次数：1 相关文章所有 9 个版本

[PDF] arxiv.org

Handling many conversions per click in modeling delayed feedback

A Badanidiyuru, A Evdokimov, V Krishnan, P Li… - arXiv preprint arXiv …, 2021 - arxiv.org

Predicting the expected value or number of post-click conversions (purchases or other
events) is a key task in performance-based digital advertising. In training a conversion …

被引用次数：3 相关文章所有 6 个版本

[PDF] depositolegale.it

Online Learning, Uniform Convergence, and a Theory of Interpretability

E Esposito - 2024 - tesidottorato.depositolegale.it

This doctoral thesis covers various aspects of theoretical machine learning relative to two of
its most fundamental paradigms: batch learning and online learning. In particular, we …

高级搜索

QQ 群