Imitation-regularized offline learning

Z Zhu, K Lin, AK Jain, J Zhou - IEEE Transactions on Pattern …, 2023 - ieeexplore.ieee.org

Reinforcement learning is a learning paradigm for solving sequential decision-making
problems. Recent years have witnessed remarkable progress in reinforcement learning …

被引用次数：731 相关文章所有 12 个版本

[PDF] researchgate.net

Pessimistic reward models for off-policy learning in recommendation

O Jeunen, B Goethals - Proceedings of the 15th ACM Conference on …, 2021 - dl.acm.org

Methods for bandit learning from user interactions often require a model of the reward a
certain context-action pair will yield–for example, the probability of a click on a …

被引用次数：52 相关文章所有 4 个版本

[PDF] acm.org

Temporal-contextual recommendation in real-time

Y Ma, B Narayanaswamy, H Lin, H Ding - Proceedings of the 26th ACM …, 2020 - dl.acm.org

Personalized real-time recommendation has had a profound impact on retail, media,
entertainment and other industries. However, developing recommender systems for every …

被引用次数：81 相关文章所有 6 个版本

[PDF] mlr.press

PAC-Bayesian offline contextual bandits with guarantees

O Sakhi, P Alquier, N Chopin - International Conference on …, 2023 - proceedings.mlr.press

This paper introduces a new principled approach for off-policy learning in contextual
bandits. Unlike previous work, our approach does not derive learning principles from …

被引用次数：16 相关文章所有 6 个版本

[PDF] acm.org

Pessimistic decision-making for recommender systems

O Jeunen, B Goethals - ACM Transactions on Recommender Systems, 2023 - dl.acm.org

Modern recommender systems are often modelled under the sequential decision-making
paradigm, where the system decides which recommendations to show in order to maximise …

被引用次数：16 相关文章所有 2 个版本

[PDF] uantwerpen.be

Joint policy-value learning for recommendation

O Jeunen, D Rohde, F Vasile, M Bompaire - Proceedings of the 26th …, 2020 - dl.acm.org

Conventional approaches to recommendation often do not explicitly take into account
information on previously shown recommendations and their recorded responses. One …

被引用次数：34 相关文章所有 4 个版本

[PDF] aaai.org Full View

Recommendations as treatments

T Joachims, B London, Y Su, A Swaminathan, L Wang - AI Magazine, 2021 - ojs.aaai.org

In recent years, a new line of research has taken an interventional view of recommender
systems, where recommendations are viewed as actions that the system takes to have a …

被引用次数：20 相关文章所有 8 个版本

[PDF] arxiv.org

POTEC: Off-Policy Learning for Large Action Spaces via Two-Stage Policy Decomposition

Y Saito, J Yao, T Joachims - arXiv preprint arXiv:2402.06151, 2024 - arxiv.org

We study off-policy learning (OPL) of contextual bandit policies in large discrete action
spaces where existing methods--most of which rely crucially on reward-regression models …

被引用次数：4 相关文章所有 2 个版本

[PDF] arxiv.org

Ad-load Balancing via Off-policy Learning in a Content Marketplace

H Sagtani, MG Jhawar, R Mehrotra… - Proceedings of the 17th …, 2024 - dl.acm.org

Ad-load balancing is a critical challenge in online advertising systems, particularly in the
context of social media platforms, where the goal is to maximize user engagement and …

被引用次数：6 相关文章所有 3 个版本

[PDF] mlr.press

Bayesian counterfactual risk minimization

B London, T Sandler - International Conference on Machine …, 2019 - proceedings.mlr.press

We present a Bayesian view of counterfactual risk minimization (CRM) for offline learning
from logged bandit feedback. Using PAC-Bayesian analysis, we derive a new generalization …

被引用次数：38 相关文章所有 7 个版本

高级搜索

QQ 群