The exploration-exploitation dilemma has been a central challenge in reinforcement learning (RL) with complex model classes. In this paper, we propose a new algorithm …
X Ji, G Li - Advances in Neural Information Processing …, 2024 - proceedings.neurips.cc
A crucial problem in reinforcement learning is learning the optimal policy. We study this in tabular infinite-horizon discounted Markov decision processes under the online setting. The …
M Yin, M Wang, YX Wang - arXiv preprint arXiv:2501.02089, 2025 - arxiv.org
This article reviews the recent advances on the statistical foundation of reinforcement learning (RL) in the offline and low-adaptive settings. We will start by arguing why offline RL …
We study stochastic delayed feedback in general single-agent and multi-agent sequential decision making, which includes bandits, single-agent Markov decision processes (MDPs) …
D Qiao, YX Wang - arXiv preprint arXiv:2210.00701, 2022 - arxiv.org
We study the problem of deployment efficient reinforcement learning (RL) with linear function approximation under the\emph {reward-free} exploration setting. This is a well …
D Qiao, M Yin, YX Wang - arXiv preprint arXiv:2302.12456, 2023 - arxiv.org
In many real-life reinforcement learning (RL) problems, deploying new policies is costly. In those scenarios, algorithms must solve exploration (which requires adaptivity) while …
E Johnson, C Pike-Burke, P Rebeschini - arXiv preprint arXiv:2310.01616, 2023 - arxiv.org
We theoretically explore the relationship between sample-efficiency and adaptivity in reinforcement learning. An algorithm is sample-efficient if it uses a number of queries $ n …
Differential privacy (DP) has recently been introduced into episodic reinforcement learning (RL) to formally address user privacy concerns in personalized services. Previous work …
D Qiao, YX Wang - arXiv preprint arXiv:2402.01111, 2024 - arxiv.org
We study the problem of multi-agent reinforcement learning (MARL) with adaptivity constraints--a new problem motivated by real-world applications where deployments of new …