VOL: Towards Optimal Regret in Model-free RL with Nonlinear Function Approximation

A Agarwal, Y Jin, T Zhang - The Thirty Sixth Annual …, 2023 - proceedings.mlr.press
We study time-inhomogeneous episodic reinforcement learning (RL) under general function
approximation and sparse rewards. We design a new algorithm, Variance-weighted …

Computationally efficient horizon-free reinforcement learning for linear mixture mdps

D Zhou, Q Gu - Advances in neural information processing …, 2022 - proceedings.neurips.cc
Recent studies have shown that episodic reinforcement learning (RL) is not more difficult
than bandits, even with a long planning horizon and unknown state transitions. However …

Optimal multi-distribution learning

Z Zhang, W Zhan, Y Chen, SS Du… - The Thirty Seventh …, 2024 - proceedings.mlr.press
Abstract Multi-distribution learning (MDL), which seeks to learn a shared model that
minimizes the worst-case risk across $ k $ distinct data distributions, has emerged as a …

Settling the sample complexity of online reinforcement learning

Z Zhang, Y Chen, JD Lee… - The Thirty Seventh Annual …, 2024 - proceedings.mlr.press
A central issue lying at the heart of online reinforcement learning (RL) is data efficiency.
While a number of recent works achieved asymptotically minimal regret in online RL, the …

Variance-dependent regret bounds for linear bandits and reinforcement learning: Adaptivity and computational efficiency

H Zhao, J He, D Zhou, T Zhang… - The Thirty Sixth Annual …, 2023 - proceedings.mlr.press
Recently, several studies\citep {zhou2021nearly, zhang2021variance, kim2021improved,
zhou2022computationally} have provided variance-dependent regret bounds for linear …

Gec: A unified framework for interactive decision making in mdp, pomdp, and beyond

H Zhong, W Xiong, S Zheng, L Wang, Z Wang… - arXiv preprint arXiv …, 2022 - arxiv.org
We study sample efficient reinforcement learning (RL) under the general framework of
interactive decision making, which includes Markov decision process (MDP), partially …

Sharp variance-dependent bounds in reinforcement learning: Best of both worlds in stochastic and deterministic environments

R Zhou, Z Zihan, SS Du - International Conference on …, 2023 - proceedings.mlr.press
We study variance-dependent regret bounds for Markov decision processes (MDPs).
Algorithms with variance-dependent regret guarantees can automatically exploit …

Optimal horizon-free reward-free exploration for linear mixture mdps

J Zhang, W Zhang, Q Gu - International Conference on …, 2023 - proceedings.mlr.press
We study reward-free reinforcement learning (RL) with linear function approximation, where
the agent works in two phases:(1) in the exploration phase, the agent interacts with the …

[PDF][PDF] Exploring and learning in sparse linear mdps without computationally intractable oracles

N Golowich, A Moitra, D Rohatgi - Proceedings of the 56th Annual ACM …, 2024 - dl.acm.org
The key assumption underlying linear Markov Decision Processes (MDPs) is that the learner
has access to a known feature map φ (x, a) that maps state-action pairs to d-dimensional …

Is behavior cloning all you need? understanding horizon in imitation learning

DJ Foster, A Block, D Misra - arXiv preprint arXiv:2407.15007, 2024 - arxiv.org
Imitation learning (IL) aims to mimic the behavior of an expert in a sequential decision
making task by learning from demonstrations, and has been widely applied to robotics …