Multi-task learning for contextual bandits

Y Zhang, Q Yang - IEEE transactions on knowledge and data …, 2021 - ieeexplore.ieee.org

Multi-Task Learning (MTL) is a learning paradigm in machine learning and its aim is to
leverage useful information contained in multiple related tasks to help improve the …

被引用次数：2993 相关文章所有 7 个版本

[PDF] iospress.com

Reinforcement learning for personalization: A systematic literature review

F Den Hengst, EM Grua, A el Hassouni… - Data …, 2020 - content.iospress.com

The major application areas of reinforcement learning (RL) have traditionally been game
playing and continuous control. In recent years, however, RL has been increasingly applied …

被引用次数：50 相关文章所有 13 个版本

[PDF] thecvf.com

Adaptive methods for real-world domain generalization

A Dubey, V Ramanathan… - Proceedings of the …, 2021 - openaccess.thecvf.com

Invariant approaches have been remarkably successful in tackling the problem of domain
generalization, where the objective is to perform inference on data distributions different …

被引用次数：98 相关文章所有 8 个版本

[PDF] mlr.press

Meta-thompson sampling

B Kveton, M Konobeev, M Zaheer… - International …, 2021 - proceedings.mlr.press

Efficient exploration in bandits is a fundamental online learning problem. We propose a
variant of Thompson sampling that learns to explore better as it interacts with bandit …

被引用次数：77 相关文章所有 11 个版本

[PDF] mdpi.com

Designing reinforcement learning algorithms for digital interventions: pre-implementation guidelines

AL Trella, KW Zhang, I Nahum-Shani, V Shetty… - Algorithms, 2022 - mdpi.com

Online reinforcement learning (RL) algorithms are increasingly used to personalize digital
interventions in the fields of mobile health and online education. Common challenges in …

被引用次数：44 相关文章所有 12 个版本

[PDF] mlr.press

Hierarchical bayesian bandits

J Hong, B Kveton, M Zaheer… - International …, 2022 - proceedings.mlr.press

Abstract Meta-, multi-task, and federated learning can be all viewed as solving similar tasks,
drawn from a distribution that reflects task similarities. We provide a unified view of all these …

被引用次数：45 相关文章所有 4 个版本

[PDF] mlr.press

Multi-armed bandit experimental design: Online decision-making and adaptive inference

D Simchi-Levi, C Wang - International Conference on …, 2023 - proceedings.mlr.press

Multi-armed bandit has been well-known for its efficiency in online decision-making in terms
of minimizing the loss of the participants' welfare during experiments (ie, the regret). In …

被引用次数：34 相关文章所有 2 个版本

[PDF] neurips.cc

No regrets for learning the prior in bandits

S Basu, B Kveton, M Zaheer… - Advances in neural …, 2021 - proceedings.neurips.cc

Abstract We propose AdaTS, a Thompson sampling algorithm that adapts sequentially to
bandit tasks that it interacts with. The key idea in AdaTS is to adapt to an unknown task prior …

被引用次数：38 相关文章所有 7 个版本

[PDF] neurips.cc

Online clustering of bandits with misspecified user models

Z Wang, J Xie, X Liu, S Li, J Lui - Advances in Neural …, 2024 - proceedings.neurips.cc

The contextual linear bandit is an important online learning problem where given arm
features, a learning agent selects an arm at each round to maximize the cumulative rewards …

被引用次数：11 相关文章所有 5 个版本

[PDF] mlr.press

Meta-learning with stochastic linear bandits

L Cella, A Lazaric, M Pontil - International Conference on …, 2020 - proceedings.mlr.press

We investigate meta-learning procedures in the setting of stochastic linear bandits tasks.
The goal is to select a learning algorithm which works well on average over a class of …

被引用次数：70 相关文章所有 8 个版本

高级搜索

QQ 群