Contextual multi-armed bandits

MG Azar, ZD Guo, B Piot, R Munos… - International …, 2024 - proceedings.mlr.press

The prevalent deployment of learning from human preferences through reinforcement
learning (RLHF) relies on two important approximations: the first assumes that pairwise …

被引用次数：304 相关文章所有 4 个版本

[PDF] nowpublishers.com

Introduction to multi-armed bandits

A Slivkins - Foundations and Trends® in Machine Learning, 2019 - nowpublishers.com

Multi-armed bandits a simple but very powerful framework for algorithms that make
decisions over time under uncertainty. An enormous body of work has accumulated over the …

被引用次数：1190 相关文章所有 7 个版本

[PDF] koreamed.org

[PDF][PDF] Deep learning

I Goodfellow - 2016 - synapse.koreamed.org

An introduction to a broad range of topics in deep learning, covering mathematical and
conceptual background, deep learning techniques used in industry, and research …

被引用次数：70823 相关文章

[PDF] academia.edu

[图书][B] Deep learning

Y Bengio, I Goodfellow, A Courville - 2017 - academia.edu

Inventors have long dreamed of creating machines that think. Ancient Greek myths tell of
intelligent objects, such as animated statues of human beings and tables that arrive full of …

被引用次数：2119 相关文章所有 4 个版本

[PDF] acm.org

Bandits with knapsacks

A Badanidiyuru, R Kleinberg, A Slivkins - Journal of the ACM (JACM), 2018 - dl.acm.org

Multi-armed bandit problems are the predominant theoretical model of exploration-
exploitation tradeoffs in learning, and they have countless applications ranging from medical …

被引用次数：522 相关文章所有 11 个版本

[PDF] ieee.org

Context-aware proactive content caching with service differentiation in wireless networks

S Müller, O Atan, M Van Der Schaar… - IEEE Transactions on …, 2016 - ieeexplore.ieee.org

Content caching in small base stations or wireless infostations is considered to be a suitable
approach to improve the efficiency in wireless content delivery. Placing the optimal content …

被引用次数：293 相关文章所有 6 个版本

[PDF] researchgate.net

MyBehavior: automatic personalized health feedback from user behaviors and preferences using smartphones

M Rabbi, MH Aung, M Zhang… - Proceedings of the 2015 …, 2015 - dl.acm.org

Mobile sensing systems have made significant advances in tracking human behavior.
However, the development of personalized mobile health feedback systems is still in its …

被引用次数：297 相关文章所有 11 个版本

[PDF] ieee.org

Wireless recommendations for Internet of vehicles: Recent advances, challenges, and opportunities

T Li, C Li, J Luo, L Song - Intelligent and Converged Networks, 2020 - ieeexplore.ieee.org

Internet of Vehicles (IoV) is a distributed network of connected cars, roadside infrastructure,
wireless communication networks, and central cloud platforms. Wireless recommendations …

被引用次数：31 相关文章所有 5 个版本

[PDF] ambujtewari.com

From ads to interventions: Contextual bandits in mobile health

A Tewari, SA Murphy - Mobile health: sensors, analytic methods, and …, 2017 - Springer

The first paper on contextual bandits was written by Michael Woodroofe in 1979 (Journal of
the American Statistical Association, 74 (368), 799–806, 1979) but the term “contextual …

被引用次数：238 相关文章所有 5 个版本

[PDF] mlr.press

Contextual bandits with similarity information

A Slivkins - Proceedings of the 24th annual Conference On …, 2011 - proceedings.mlr.press

In a multi-armed bandit (MAB) problem, an online algorithm makes a sequence of choices.
In each round it chooses from a time-invariant set of alternatives and receives the payoff …

被引用次数：486 相关文章所有 12 个版本

高级搜索

QQ 群

A general theoretical paradigm to understand learning from human preferences