Federated Q-learning: Linear regret speedup with low communication cost

Z Zheng, F Gao, L Xue, J Yang - arXiv preprint arXiv:2312.15023, 2023 - arxiv.org
In this paper, we consider federated reinforcement learning for tabular episodic Markov
Decision Processes (MDP) where, under the coordination of a central server, multiple …

A nearly optimal and low-switching algorithm for reinforcement learning with general function approximation

H Zhao, J He, Q Gu - arXiv preprint arXiv:2311.15238, 2023 - arxiv.org
The exploration-exploitation dilemma has been a central challenge in reinforcement
learning (RL) with complex model classes. In this paper, we propose a new algorithm …

Regret-optimal model-free reinforcement learning for discounted mdps with short burn-in time

X Ji, G Li - Advances in Neural Information Processing …, 2024 - proceedings.neurips.cc
A crucial problem in reinforcement learning is learning the optimal policy. We study this in
tabular infinite-horizon discounted Markov decision processes under the online setting. The …

On the Statistical Complexity for Offline and Low-Adaptive Reinforcement Learning with Structures

M Yin, M Wang, YX Wang - arXiv preprint arXiv:2501.02089, 2025 - arxiv.org
This article reviews the recent advances on the statistical foundation of reinforcement
learning (RL) in the offline and low-adaptive settings. We will start by arguing why offline RL …

A reduction-based framework for sequential decision making with delayed feedback

Y Yang, H Zhong, T Wu, B Liu… - Advances in Neural …, 2024 - proceedings.neurips.cc
We study stochastic delayed feedback in general single-agent and multi-agent sequential
decision making, which includes bandits, single-agent Markov decision processes (MDPs) …

Near-optimal deployment efficiency in reward-free reinforcement learning with linear function approximation

D Qiao, YX Wang - arXiv preprint arXiv:2210.00701, 2022 - arxiv.org
We study the problem of deployment efficient reinforcement learning (RL) with linear
function approximation under the\emph {reward-free} exploration setting. This is a well …

Logarithmic switching cost in reinforcement learning beyond linear mdps

D Qiao, M Yin, YX Wang - arXiv preprint arXiv:2302.12456, 2023 - arxiv.org
In many real-life reinforcement learning (RL) problems, deploying new policies is costly. In
those scenarios, algorithms must solve exploration (which requires adaptivity) while …

Sample-Efficiency in Multi-Batch Reinforcement Learning: The Need for Dimension-Dependent Adaptivity

E Johnson, C Pike-Burke, P Rebeschini - arXiv preprint arXiv:2310.01616, 2023 - arxiv.org
We theoretically explore the relationship between sample-efficiency and adaptivity in
reinforcement learning. An algorithm is sample-efficient if it uses a number of queries $ n …

No-regret Exploration in Shuffle Private Reinforcement Learning

S Bai, MS Talebi, C Zhao, P Cheng, J Chen - arXiv preprint arXiv …, 2024 - arxiv.org
Differential privacy (DP) has recently been introduced into episodic reinforcement learning
(RL) to formally address user privacy concerns in personalized services. Previous work …

Near-Optimal Reinforcement Learning with Self-Play under Adaptivity Constraints

D Qiao, YX Wang - arXiv preprint arXiv:2402.01111, 2024 - arxiv.org
We study the problem of multi-agent reinforcement learning (MARL) with adaptivity
constraints--a new problem motivated by real-world applications where deployments of new …