When privacy meets partial information: A refined analysis of differentially private bandits

A Azize, D Basu - Advances in Neural Information …, 2022 - proceedings.neurips.cc
We study the problem of multi-armed bandits with ε-global Differential Privacy (DP). First, we
prove the minimax and problem-dependent regret lower bounds for stochastic and linear …

Differentially private reinforcement learning with linear function approximation

X Zhou - Proceedings of the ACM on Measurement and Analysis …, 2022 - dl.acm.org
Motivated by the wide adoption of reinforcement learning (RL) in real-world personalized
services, where users' sensitive and private information needs to be protected, we study …

Offline reinforcement learning with differential privacy

D Qiao, YX Wang - Advances in Neural Information …, 2024 - proceedings.neurips.cc
The offline reinforcement learning (RL) problem is often motivated by the need to learn data-
driven decision policies in financial, legal and healthcare applications. However, the learned …

Near-optimal thompson sampling-based algorithms for differentially private stochastic bandits

B Hu, N Hegde - Uncertainty in Artificial Intelligence, 2022 - proceedings.mlr.press
We address differentially private stochastic bandits. We present two (near)-optimal
Thompson Sampling-based learning algorithms: DP-TS and Lazy-DP-TS. The core idea in …

Differentially private regret minimization in episodic markov decision processes

SR Chowdhury, X Zhou - Proceedings of the AAAI Conference on …, 2022 - ojs.aaai.org
We study regret minimization in finite horizon tabular Markov decision processes (MDPs)
under the constraints of differential privacy (DP). This is motivated by the widespread …

Concentrated differential privacy for bandits

A Azize, D Basu - 2024 IEEE Conference on Secure and …, 2024 - ieeexplore.ieee.org
Bandits serve as the theoretical foundation of sequential learning and an algorithmic
foundation of modern recommender systems. However, recommender systems often rely on …

Differentially private algorithms for efficient online matroid optimization

K Chandak, B Hu, N Hegde - Conference on Lifelong …, 2023 - proceedings.mlr.press
A matroid bandit is the online version of combinatorial optimization on a matroid, in which
the learner chooses $ K $ actions from a set of $ L $ actions that can form a matroid basis …

Littlestone classes are privately online learnable

N Golowich, R Livni - Advances in Neural Information …, 2021 - proceedings.neurips.cc
We consider the problem of online classification under a privacy constraint. In this setting a
learner observes sequentially a stream of labelled examples $(x_t, y_t) $, for $1\leq t\leq T …

Thompson Sampling Itself is Differentially Private

T Ou, R Cummings, M Avella - International Conference on …, 2024 - proceedings.mlr.press
In this work we first show that the classical Thompson sampling algorithm for multi-arm
bandits is differentially private as-is, without any modification. We provide per-round privacy …

Bandit algorithms with graphical feedback models and privacy awareness

B Hu - 2021 - dspace.library.uvic.ca
This thesis focuses on two classes of learning problems in stochastic multi-armed bandits
(MAB): graphical bandits and private bandits. Different from the basic MAB setting where the …