A Gyorgy, P Joulani - International Conference on Machine …, 2021 - proceedings.mlr.press
We consider the adversarial multi-armed bandit problem under delayed feedback. We analyze variants of the Exp3 algorithm that tune their step size using only information (about …
Recommendation, information retrieval, and other information access systems pose unique challenges for investigating and applying the fairness and non-discrimination concepts that …
Online recommender systems often face long delays in receiving feedback, especially when optimizing for some long-term metrics. While mitigating the effects of delays in learning is …
D Daoun, F Ibnat, Z Alom, Z Aung, MA Azim - The International Conference …, 2021 - Springer
Reinforcement Learning (RL) is a branch of machine learning (ML) that is used to train artificial intelligence (AI) systems and find the optimal solution for problems. This tutorial …
In supervised machine learning, privileged information (PI) is information that is unavailable at inference, but is accessible during training time. Research on learning using privileged …
We investigate a nonstochastic bandit setting in which the loss of an action is not immediately charged to the player, but rather spread over the subsequent rounds in an …
We study a $ K $-armed bandit with delayed feedback and intermediate observations. We consider a model where intermediate observations have a form of a finite state, which is …
Predicting the expected value or number of post-click conversions (purchases or other events) is a key task in performance-based digital advertising. In training a conversion …
E Esposito - 2024 - tesidottorato.depositolegale.it
This doctoral thesis covers various aspects of theoretical machine learning relative to two of its most fundamental paradigms: batch learning and online learning. In particular, we …