Contextual bandit and reinforcement learning algorithms have been successfully used in various interactive learning systems such as online advertising, recommender systems, and …
X Wang, Q Li, D Yu, G Xu - Proceedings of the ACM Web Conference …, 2022 - dl.acm.org
Reinforcement learning has recently become an active topic in recommender system research, where the logged data that records interactions between items and users …
R Gao, M Biggs, W Sun, L Han - arXiv preprint arXiv:2112.04461, 2021 - arxiv.org
Unlike traditional supervised learning, in many settings only partial feedback is available. We may only observe outcomes for the chosen actions, but not the counterfactual outcomes …
Off-policy learning (OPL) often involves minimizing a risk estimator based on importance weighting to correct bias from the logging policy used to collect data. However, this method …