O Jeunen, B Goethals - Proceedings of the 15th ACM Conference on …, 2021 - dl.acm.org
Methods for bandit learning from user interactions often require a model of the reward a certain context-action pair will yield–for example, the probability of a click on a …
Personalized real-time recommendation has had a profound impact on retail, media, entertainment and other industries. However, developing recommender systems for every …
This paper introduces a new principled approach for off-policy learning in contextual bandits. Unlike previous work, our approach does not derive learning principles from …
O Jeunen, B Goethals - ACM Transactions on Recommender Systems, 2023 - dl.acm.org
Modern recommender systems are often modelled under the sequential decision-making paradigm, where the system decides which recommendations to show in order to maximise …
Conventional approaches to recommendation often do not explicitly take into account information on previously shown recommendations and their recorded responses. One …
In recent years, a new line of research has taken an interventional view of recommender systems, where recommendations are viewed as actions that the system takes to have a …
We study off-policy learning (OPL) of contextual bandit policies in large discrete action spaces where existing methods--most of which rely crucially on reward-regression models …
Ad-load balancing is a critical challenge in online advertising systems, particularly in the context of social media platforms, where the goal is to maximize user engagement and …
B London, T Sandler - International Conference on Machine …, 2019 - proceedings.mlr.press
We present a Bayesian view of counterfactual risk minimization (CRM) for offline learning from logged bandit feedback. Using PAC-Bayesian analysis, we derive a new generalization …