Y Ban, Y Qi, T Wei, L Liu, J He - Proceedings of the 30th ACM SIGKDD …, 2024 - dl.acm.org
The contextual bandit has been identified as a powerful framework to formulate the recommendation process as a sequential decision-making process, where each item is …
YH Hung, PC Hsieh - Proceedings of the AAAI Conference on Artificial …, 2023 - ojs.aaai.org
Reward-biased maximum likelihood estimation (RBMLE) is a classic principle in the adaptive control literature for tackling explore-exploit trade-offs. This paper studies the …
Real-world applications of contextual bandits often exhibit non-stationarity due to seasonality, serendipity, and evolving social trends. While a number of non-stationary …
We propose a novel algorithm for offline reinforcement learning called Value Iteration with Perturbed Rewards (VIPeR), which amalgamates the pessimism principle with random …
K Hong, Y Li, A Tewari - International Conference on …, 2023 - proceedings.mlr.press
We propose an algorithm for non-stationary kernel bandits that does not require prior knowledge of the degree of non-stationarity. The algorithm follows randomized strategies …
C Wang, L Shi, J Luo - Algorithms, 2025 - mdpi.com
In contextual multi-armed bandits, the relationship between contextual information and rewards is typically unknown, complicating the trade-off between exploration and …
Y Ban, Y Qi, J He - Companion Proceedings of the ACM on Web …, 2024 - dl.acm.org
In the dynamic landscape of online businesses, recommender systems are pivotal in enhancing user experiences. While traditional approaches have relied on static supervised …
Contextual bandit algorithms aim to identify the optimal arm with the highest reward among a set of candidates, based on the accessible contextual information. Among these …
Current recommender systems predominantly employ supervised learning algorithms, which often fail to optimize for long-term user engagement. This short-sighted approach highlights …