A general theoretical paradigm to understand learning from human preferences

MG Azar, ZD Guo, B Piot, R Munos… - International …, 2024 - proceedings.mlr.press
The prevalent deployment of learning from human preferences through reinforcement
learning (RLHF) relies on two important approximations: the first assumes that pairwise …

Introduction to multi-armed bandits

A Slivkins - Foundations and Trends® in Machine Learning, 2019 - nowpublishers.com
Multi-armed bandits a simple but very powerful framework for algorithms that make
decisions over time under uncertainty. An enormous body of work has accumulated over the …

[PDF][PDF] Deep learning

I Goodfellow - 2016 - synapse.koreamed.org
An introduction to a broad range of topics in deep learning, covering mathematical and
conceptual background, deep learning techniques used in industry, and research …

[图书][B] Deep learning

Y Bengio, I Goodfellow, A Courville - 2017 - academia.edu
Inventors have long dreamed of creating machines that think. Ancient Greek myths tell of
intelligent objects, such as animated statues of human beings and tables that arrive full of …

Bandits with knapsacks

A Badanidiyuru, R Kleinberg, A Slivkins - Journal of the ACM (JACM), 2018 - dl.acm.org
Multi-armed bandit problems are the predominant theoretical model of exploration-
exploitation tradeoffs in learning, and they have countless applications ranging from medical …

Context-aware proactive content caching with service differentiation in wireless networks

S Müller, O Atan, M Van Der Schaar… - IEEE Transactions on …, 2016 - ieeexplore.ieee.org
Content caching in small base stations or wireless infostations is considered to be a suitable
approach to improve the efficiency in wireless content delivery. Placing the optimal content …

MyBehavior: automatic personalized health feedback from user behaviors and preferences using smartphones

M Rabbi, MH Aung, M Zhang… - Proceedings of the 2015 …, 2015 - dl.acm.org
Mobile sensing systems have made significant advances in tracking human behavior.
However, the development of personalized mobile health feedback systems is still in its …

Wireless recommendations for Internet of vehicles: Recent advances, challenges, and opportunities

T Li, C Li, J Luo, L Song - Intelligent and Converged Networks, 2020 - ieeexplore.ieee.org
Internet of Vehicles (IoV) is a distributed network of connected cars, roadside infrastructure,
wireless communication networks, and central cloud platforms. Wireless recommendations …

From ads to interventions: Contextual bandits in mobile health

A Tewari, SA Murphy - Mobile health: sensors, analytic methods, and …, 2017 - Springer
The first paper on contextual bandits was written by Michael Woodroofe in 1979 (Journal of
the American Statistical Association, 74 (368), 799–806, 1979) but the term “contextual …

Contextual bandits with similarity information

A Slivkins - Proceedings of the 24th annual Conference On …, 2011 - proceedings.mlr.press
In a multi-armed bandit (MAB) problem, an online algorithm makes a sequence of choices.
In each round it chooses from a time-invariant set of alternatives and receives the payoff …