Optimal algorithms for private online learning in a stochastic environment

A Azize, D Basu - Advances in Neural Information …, 2022 - proceedings.neurips.cc

We study the problem of multi-armed bandits with ε-global Differential Privacy (DP). First, we
prove the minimax and problem-dependent regret lower bounds for stochastic and linear …

被引用次数：20 相关文章所有 10 个版本

[PDF] arxiv.org

Differentially private reinforcement learning with linear function approximation

X Zhou - Proceedings of the ACM on Measurement and Analysis …, 2022 - dl.acm.org

Motivated by the wide adoption of reinforcement learning (RL) in real-world personalized
services, where users' sensitive and private information needs to be protected, we study …

被引用次数：32 相关文章所有 6 个版本

[PDF] neurips.cc

Offline reinforcement learning with differential privacy

D Qiao, YX Wang - Advances in Neural Information …, 2024 - proceedings.neurips.cc

The offline reinforcement learning (RL) problem is often motivated by the need to learn data-
driven decision policies in financial, legal and healthcare applications. However, the learned …

被引用次数：20 相关文章所有 7 个版本

[PDF] mlr.press

Near-optimal thompson sampling-based algorithms for differentially private stochastic bandits

B Hu, N Hegde - Uncertainty in Artificial Intelligence, 2022 - proceedings.mlr.press

We address differentially private stochastic bandits. We present two (near)-optimal
Thompson Sampling-based learning algorithms: DP-TS and Lazy-DP-TS. The core idea in …

被引用次数：14 相关文章所有 3 个版本

[PDF] aaai.org

Differentially private regret minimization in episodic markov decision processes

SR Chowdhury, X Zhou - Proceedings of the AAAI Conference on …, 2022 - ojs.aaai.org

We study regret minimization in finite horizon tabular Markov decision processes (MDPs)
under the constraints of differential privacy (DP). This is motivated by the widespread …

被引用次数：19 相关文章所有 5 个版本

[PDF] openreview.net

Concentrated differential privacy for bandits

A Azize, D Basu - 2024 IEEE Conference on Secure and …, 2024 - ieeexplore.ieee.org

Bandits serve as the theoretical foundation of sequential learning and an algorithmic
foundation of modern recommender systems. However, recommender systems often rely on …

被引用次数：4 相关文章所有 5 个版本

[PDF] mlr.press

Differentially private algorithms for efficient online matroid optimization

K Chandak, B Hu, N Hegde - Conference on Lifelong …, 2023 - proceedings.mlr.press

A matroid bandit is the online version of combinatorial optimization on a matroid, in which
the learner chooses $ K $ actions from a set of $ L $ actions that can form a matroid basis …

被引用次数：1 相关文章所有 4 个版本

[PDF] neurips.cc

Littlestone classes are privately online learnable

N Golowich, R Livni - Advances in Neural Information …, 2021 - proceedings.neurips.cc

We consider the problem of online classification under a privacy constraint. In this setting a
learner observes sequentially a stream of labelled examples $(x_t, y_t) $, for $1\leq t\leq T …

被引用次数：11 相关文章所有 6 个版本

[PDF] mlr.press

Thompson Sampling Itself is Differentially Private

T Ou, R Cummings, M Avella - International Conference on …, 2024 - proceedings.mlr.press

In this work we first show that the classical Thompson sampling algorithm for multi-arm
bandits is differentially private as-is, without any modification. We provide per-round privacy …

[PDF] uvic.ca

Bandit algorithms with graphical feedback models and privacy awareness

B Hu - 2021 - dspace.library.uvic.ca

This thesis focuses on two classes of learning problems in stochastic multi-armed bandits
(MAB): graphical bandits and private bandits. Different from the basic MAB setting where the …

高级搜索

QQ 群