VOL: Towards Optimal Regret in Model-free RL with Nonlinear Function Approximation

A Agarwal, Y Jin, T Zhang - The Thirty Sixth Annual …, 2023 - proceedings.mlr.press
We study time-inhomogeneous episodic reinforcement learning (RL) under general function
approximation and sparse rewards. We design a new algorithm, Variance-weighted …

Computationally efficient horizon-free reinforcement learning for linear mixture mdps

D Zhou, Q Gu - Advances in neural information processing …, 2022 - proceedings.neurips.cc
Recent studies have shown that episodic reinforcement learning (RL) is not more difficult
than bandits, even with a long planning horizon and unknown state transitions. However …

Settling the sample complexity of online reinforcement learning

Z Zhang, Y Chen, JD Lee… - The Thirty Seventh Annual …, 2024 - proceedings.mlr.press
A central issue lying at the heart of online reinforcement learning (RL) is data efficiency.
While a number of recent works achieved asymptotically minimal regret in online RL, the …

Horizon-free reinforcement learning in polynomial time: the power of stationary policies

Z Zhang, X Ji, S Du - Conference on Learning Theory, 2022 - proceedings.mlr.press
This paper gives the first polynomial-time algorithm for tabular Markov Decision Processes
(MDP) that enjoys a regret bound\emph {independent on the planning horizon}. Specifically …

Learning stochastic shortest path with linear function approximation

Y Min, J He, T Wang, Q Gu - International Conference on …, 2022 - proceedings.mlr.press
We study the stochastic shortest path (SSP) problem in reinforcement learning with linear
function approximation, where the transition kernel is represented as a linear mixture of …

Sharp variance-dependent bounds in reinforcement learning: Best of both worlds in stochastic and deterministic environments

R Zhou, Z Zihan, SS Du - International Conference on …, 2023 - proceedings.mlr.press
We study variance-dependent regret bounds for Markov decision processes (MDPs).
Algorithms with variance-dependent regret guarantees can automatically exploit …

Optimal horizon-free reward-free exploration for linear mixture mdps

J Zhang, W Zhang, Q Gu - International Conference on …, 2023 - proceedings.mlr.press
We study reward-free reinforcement learning (RL) with linear function approximation, where
the agent works in two phases:(1) in the exploration phase, the agent interacts with the …

Improved no-regret algorithms for stochastic shortest path with linear mdp

L Chen, R Jain, H Luo - International Conference on …, 2022 - proceedings.mlr.press
We introduce two new no-regret algorithms for the stochastic shortest path (SSP) problem
with a linear MDP that significantly improve over the only existing results of (Vial et al …

Minimax regret for stochastic shortest path

A Cohen, Y Efroni, Y Mansour… - Advances in neural …, 2021 - proceedings.neurips.cc
Abstract We study the Stochastic Shortest Path (SSP) problem in which an agent has to
reach a goal state in minimum total expected cost. In the learning formulation of the problem …

Implicit finite-horizon approximation and efficient optimal algorithms for stochastic shortest path

L Chen, M Jafarnia-Jahromi… - Advances in Neural …, 2021 - proceedings.neurips.cc
We introduce a generic template for developing regret minimization algorithms in the
Stochastic Shortest Path (SSP) model, which achieves minimax optimal regret as long as …