D Zhou, Q Gu - Advances in neural information processing …, 2022 - proceedings.neurips.cc
Recent studies have shown that episodic reinforcement learning (RL) is not more difficult than bandits, even with a long planning horizon and unknown state transitions. However …
Abstract Multi-distribution learning (MDL), which seeks to learn a shared model that minimizes the worst-case risk across $ k $ distinct data distributions, has emerged as a …
Z Zhang, Y Chen, JD Lee… - The Thirty Seventh Annual …, 2024 - proceedings.mlr.press
A central issue lying at the heart of online reinforcement learning (RL) is data efficiency. While a number of recent works achieved asymptotically minimal regret in online RL, the …
Recently, several studies\citep {zhou2021nearly, zhang2021variance, kim2021improved, zhou2022computationally} have provided variance-dependent regret bounds for linear …
We study sample efficient reinforcement learning (RL) under the general framework of interactive decision making, which includes Markov decision process (MDP), partially …
R Zhou, Z Zihan, SS Du - International Conference on …, 2023 - proceedings.mlr.press
We study variance-dependent regret bounds for Markov decision processes (MDPs). Algorithms with variance-dependent regret guarantees can automatically exploit …
J Zhang, W Zhang, Q Gu - International Conference on …, 2023 - proceedings.mlr.press
We study reward-free reinforcement learning (RL) with linear function approximation, where the agent works in two phases:(1) in the exploration phase, the agent interacts with the …
The key assumption underlying linear Markov Decision Processes (MDPs) is that the learner has access to a known feature map φ (x, a) that maps state-action pairs to d-dimensional …
Imitation learning (IL) aims to mimic the behavior of an expert in a sequential decision making task by learning from demonstrations, and has been widely applied to robotics …