Federated neural bandits

Z Dai, Y Shu, A Verma, FX Fan, BKH Low… - arXiv preprint arXiv …, 2022 - arxiv.org
Recent works on neural contextual bandits have achieved compelling performances due to
their ability to leverage the strong representation power of neural networks (NNs) for reward …

Meta clustering of neural bandits

Y Ban, Y Qi, T Wei, L Liu, J He - Proceedings of the 30th ACM SIGKDD …, 2024 - dl.acm.org
The contextual bandit has been identified as a powerful framework to formulate the
recommendation process as a sequential decision-making process, where each item is …

Reward-biased maximum likelihood estimation for neural contextual bandits: a distributional learning perspective

YH Hung, PC Hsieh - Proceedings of the AAAI Conference on Artificial …, 2023 - ojs.aaai.org
Reward-biased maximum likelihood estimation (RBMLE) is a classic principle in the
adaptive control literature for tackling explore-exploit trade-offs. This paper studies the …

Non-Stationary Contextual Bandit Learning via Neural Predictive Ensemble Sampling

Z Zhu, Y Liu, X Kuang, B Van Roy - arXiv preprint arXiv:2310.07786, 2023 - arxiv.org
Real-world applications of contextual bandits often exhibit non-stationarity due to
seasonality, serendipity, and evolving social trends. While a number of non-stationary …

Viper: Provably efficient algorithm for offline RL with neural function approximation

T Nguyen-Tang, R Arora - arXiv preprint arXiv:2302.12780, 2023 - arxiv.org
We propose a novel algorithm for offline reinforcement learning called Value Iteration with
Perturbed Rewards (VIPeR), which amalgamates the pessimism principle with random …

An optimization-based algorithm for non-stationary kernel bandits without prior knowledge

K Hong, Y Li, A Tewari - International Conference on …, 2023 - proceedings.mlr.press
We propose an algorithm for non-stationary kernel bandits that does not require prior
knowledge of the degree of non-stationarity. The algorithm follows randomized strategies …

[HTML][HTML] Adaptive Noise Exploration for Neural Contextual Multi-Armed Bandits

C Wang, L Shi, J Luo - Algorithms, 2025 - mdpi.com
In contextual multi-armed bandits, the relationship between contextual information and
rewards is typically unknown, complicating the trade-off between exploration and …

Neural Contextual Bandits for Personalized Recommendation

Y Ban, Y Qi, J He - Companion Proceedings of the ACM on Web …, 2024 - dl.acm.org
In the dynamic landscape of online businesses, recommender systems are pivotal in
enhancing user experiences. While traditional approaches have relied on static supervised …

Robust Neural Contextual Bandit against Adversarial Corruptions

Y Qi, Y Ban, A Banerjee, J He - The Thirty-eighth Annual Conference on … - openreview.net
Contextual bandit algorithms aim to identify the optimal arm with the highest reward among
a set of candidates, based on the accessible contextual information. Among these …

Efficient Deep Reinforcement Learning for Recommender Systems

Z Zhu - 2023 - search.proquest.com
Current recommender systems predominantly employ supervised learning algorithms, which
often fail to optimize for long-term user engagement. This short-sighted approach highlights …