Recently, several studies\citep {zhou2021nearly, zhang2021variance, kim2021improved, zhou2022computationally} have provided variance-dependent regret bounds for linear …
While numerous works have focused on devising efficient algorithms for reinforcement learning (RL) with uniformly bounded rewards, it remains an open question whether sample …
Imitation learning (IL) aims to mimic the behavior of an expert in a sequential decision making task by learning from demonstrations, and has been widely applied to robotics …
In this paper, we prove that Distributional Reinforcement Learning (DistRL), which learns the return distribution, can obtain second-order bounds in both online and offline RL in general …
In this paper, we consider federated reinforcement learning for tabular episodic Markov Decision Processes (MDP) where, under the coordination of a central server, multiple …
J Huang, H Zhong, L Wang… - … Conference on Artificial …, 2024 - proceedings.mlr.press
To tackle long planning horizon problems in reinforcement learning with general function approximation, we propose the first algorithm, termed as UCRL-WVTR, that achieves …
In this work, we study the\textit {state-free RL} problem, where the algorithm does not have the states information before interacting with the environment. Specifically, denote the …
Z Wang, D Zhou, J Lui, W Sun - arXiv preprint arXiv:2408.08994, 2024 - arxiv.org
Learning a transition model via Maximum Likelihood Estimation (MLE) followed by planning inside the learned model is perhaps the most standard and simplest Model-based …
Z Zheng, H Zhang, L Xue - arXiv preprint arXiv:2405.18795, 2024 - arxiv.org
In this paper, we consider model-free federated reinforcement learning for tabular episodic Markov decision processes. Under the coordination of a central server, multiple agents …