Learning neural contextual bandits through perturbed rewards

Z Dai, Y Shu, A Verma, FX Fan, BKH Low… - arXiv preprint arXiv …, 2022 - arxiv.org

Recent works on neural contextual bandits have achieved compelling performances due to
their ability to leverage the strong representation power of neural networks (NNs) for reward …

被引用次数：25 相关文章所有 4 个版本

[PDF] acm.org

Meta clustering of neural bandits

Y Ban, Y Qi, T Wei, L Liu, J He - Proceedings of the 30th ACM SIGKDD …, 2024 - dl.acm.org

The contextual bandit has been identified as a powerful framework to formulate the
recommendation process as a sequential decision-making process, where each item is …

被引用次数：2 相关文章所有 5 个版本

[PDF] aaai.org

Reward-biased maximum likelihood estimation for neural contextual bandits: a distributional learning perspective

YH Hung, PC Hsieh - Proceedings of the AAAI Conference on Artificial …, 2023 - ojs.aaai.org

Reward-biased maximum likelihood estimation (RBMLE) is a classic principle in the
adaptive control literature for tackling explore-exploit trade-offs. This paper studies the …

被引用次数：2 相关文章所有 3 个版本

[PDF] arxiv.org

Non-Stationary Contextual Bandit Learning via Neural Predictive Ensemble Sampling

Z Zhu, Y Liu, X Kuang, B Van Roy - arXiv preprint arXiv:2310.07786, 2023 - arxiv.org

Real-world applications of contextual bandits often exhibit non-stationarity due to
seasonality, serendipity, and evolving social trends. While a number of non-stationary …

被引用次数：3 相关文章所有 3 个版本

[PDF] arxiv.org

Viper: Provably efficient algorithm for offline RL with neural function approximation

T Nguyen-Tang, R Arora - arXiv preprint arXiv:2302.12780, 2023 - arxiv.org

We propose a novel algorithm for offline reinforcement learning called Value Iteration with
Perturbed Rewards (VIPeR), which amalgamates the pessimism principle with random …

被引用次数：5 相关文章所有 5 个版本

[PDF] mlr.press

An optimization-based algorithm for non-stationary kernel bandits without prior knowledge

K Hong, Y Li, A Tewari - International Conference on …, 2023 - proceedings.mlr.press

We propose an algorithm for non-stationary kernel bandits that does not require prior
knowledge of the degree of non-stationarity. The algorithm follows randomized strategies …

被引用次数：9 相关文章所有 5 个版本

[HTML] mdpi.com

[HTML][HTML] Adaptive Noise Exploration for Neural Contextual Multi-Armed Bandits

C Wang, L Shi, J Luo - Algorithms, 2025 - mdpi.com

In contextual multi-armed bandits, the relationship between contextual information and
rewards is typically unknown, complicating the trade-off between exploration and …

[PDF] acm.org

Neural Contextual Bandits for Personalized Recommendation

Y Ban, Y Qi, J He - Companion Proceedings of the ACM on Web …, 2024 - dl.acm.org

In the dynamic landscape of online businesses, recommender systems are pivotal in
enhancing user experiences. While traditional approaches have relied on static supervised …

被引用次数：2 相关文章所有 3 个版本

[PDF] openreview.net

Robust Neural Contextual Bandit against Adversarial Corruptions

Y Qi, Y Ban, A Banerjee, J He - The Thirty-eighth Annual Conference on … - openreview.net

Contextual bandit algorithms aim to identify the optimal arm with the highest reward among
a set of candidates, based on the accessible contextual information. Among these …

Efficient Deep Reinforcement Learning for Recommender Systems

Z Zhu - 2023 - search.proquest.com

Current recommender systems predominantly employ supervised learning algorithms, which
often fail to optimize for long-term user engagement. This short-sighted approach highlights …

高级搜索

QQ 群