Settling the sample complexity of online reinforcement learning

Z Zhang, Y Chen, JD Lee… - The Thirty Seventh Annual …, 2024 - proceedings.mlr.press
A central issue lying at the heart of online reinforcement learning (RL) is data efficiency.
While a number of recent works achieved asymptotically minimal regret in online RL, the …

Variance-dependent regret bounds for linear bandits and reinforcement learning: Adaptivity and computational efficiency

H Zhao, J He, D Zhou, T Zhang… - The Thirty Sixth Annual …, 2023 - proceedings.mlr.press
Recently, several studies\citep {zhou2021nearly, zhang2021variance, kim2021improved,
zhou2022computationally} have provided variance-dependent regret bounds for linear …

Tackling heavy-tailed rewards in reinforcement learning with function approximation: Minimax optimal and instance-dependent regret bounds

J Huang, H Zhong, L Wang… - Advances in Neural …, 2024 - proceedings.neurips.cc
While numerous works have focused on devising efficient algorithms for reinforcement
learning (RL) with uniformly bounded rewards, it remains an open question whether sample …

Is behavior cloning all you need? understanding horizon in imitation learning

DJ Foster, A Block, D Misra - arXiv preprint arXiv:2407.15007, 2024 - arxiv.org
Imitation learning (IL) aims to mimic the behavior of an expert in a sequential decision
making task by learning from demonstrations, and has been widely applied to robotics …

More benefits of being distributional: Second-order bounds for reinforcement learning

K Wang, O Oertell, A Agarwal, N Kallus… - arXiv preprint arXiv …, 2024 - arxiv.org
In this paper, we prove that Distributional Reinforcement Learning (DistRL), which learns the
return distribution, can obtain second-order bounds in both online and offline RL in general …

Federated Q-learning: Linear regret speedup with low communication cost

Z Zheng, F Gao, L Xue, J Yang - arXiv preprint arXiv:2312.15023, 2023 - arxiv.org
In this paper, we consider federated reinforcement learning for tabular episodic Markov
Decision Processes (MDP) where, under the coordination of a central server, multiple …

Horizon-free and instance-dependent regret bounds for reinforcement learning with general function approximation

J Huang, H Zhong, L Wang… - … Conference on Artificial …, 2024 - proceedings.mlr.press
To tackle long planning horizon problems in reinforcement learning with general function
approximation, we propose the first algorithm, termed as UCRL-WVTR, that achieves …

State-free Reinforcement Learning

M Chen, A Pacchiano, X Zhang - arXiv preprint arXiv:2409.18439, 2024 - arxiv.org
In this work, we study the\textit {state-free RL} problem, where the algorithm does not have
the states information before interacting with the environment. Specifically, denote the …

Model-based RL as a Minimalist Approach to Horizon-Free and Second-Order Bounds

Z Wang, D Zhou, J Lui, W Sun - arXiv preprint arXiv:2408.08994, 2024 - arxiv.org
Learning a transition model via Maximum Likelihood Estimation (MLE) followed by planning
inside the learned model is perhaps the most standard and simplest Model-based …

Federated Q-Learning with Reference-Advantage Decomposition: Almost Optimal Regret and Logarithmic Communication Cost

Z Zheng, H Zhang, L Xue - arXiv preprint arXiv:2405.18795, 2024 - arxiv.org
In this paper, we consider model-free federated reinforcement learning for tabular episodic
Markov decision processes. Under the coordination of a central server, multiple agents …