Stochastic shortest path: Minimax, parameter-free and towards horizon-free regret

A Agarwal, Y Jin, T Zhang - The Thirty Sixth Annual …, 2023 - proceedings.mlr.press

We study time-inhomogeneous episodic reinforcement learning (RL) under general function
approximation and sparse rewards. We design a new algorithm, Variance-weighted …

被引用次数：47 相关文章所有 5 个版本

[PDF] neurips.cc

Computationally efficient horizon-free reinforcement learning for linear mixture mdps

D Zhou, Q Gu - Advances in neural information processing …, 2022 - proceedings.neurips.cc

Recent studies have shown that episodic reinforcement learning (RL) is not more difficult
than bandits, even with a long planning horizon and unknown state transitions. However …

被引用次数：43 相关文章所有 8 个版本

[PDF] mlr.press

Settling the sample complexity of online reinforcement learning

Z Zhang, Y Chen, JD Lee… - The Thirty Seventh Annual …, 2024 - proceedings.mlr.press

A central issue lying at the heart of online reinforcement learning (RL) is data efficiency.
While a number of recent works achieved asymptotically minimal regret in online RL, the …

被引用次数：21 相关文章所有 3 个版本

[PDF] mlr.press

Horizon-free reinforcement learning in polynomial time: the power of stationary policies

Z Zhang, X Ji, S Du - Conference on Learning Theory, 2022 - proceedings.mlr.press

This paper gives the first polynomial-time algorithm for tabular Markov Decision Processes
(MDP) that enjoys a regret bound\emph {independent on the planning horizon}. Specifically …

被引用次数：29 相关文章所有 4 个版本

[PDF] mlr.press

Learning stochastic shortest path with linear function approximation

Y Min, J He, T Wang, Q Gu - International Conference on …, 2022 - proceedings.mlr.press

We study the stochastic shortest path (SSP) problem in reinforcement learning with linear
function approximation, where the transition kernel is represented as a linear mixture of …

被引用次数：33 相关文章所有 9 个版本

[PDF] mlr.press

Sharp variance-dependent bounds in reinforcement learning: Best of both worlds in stochastic and deterministic environments

R Zhou, Z Zihan, SS Du - International Conference on …, 2023 - proceedings.mlr.press

We study variance-dependent regret bounds for Markov decision processes (MDPs).
Algorithms with variance-dependent regret guarantees can automatically exploit …

被引用次数：11 相关文章所有 6 个版本

[PDF] mlr.press

Optimal horizon-free reward-free exploration for linear mixture mdps

J Zhang, W Zhang, Q Gu - International Conference on …, 2023 - proceedings.mlr.press

We study reward-free reinforcement learning (RL) with linear function approximation, where
the agent works in two phases:(1) in the exploration phase, the agent interacts with the …

被引用次数：8 相关文章所有 7 个版本

[PDF] mlr.press

Improved no-regret algorithms for stochastic shortest path with linear mdp

L Chen, R Jain, H Luo - International Conference on …, 2022 - proceedings.mlr.press

We introduce two new no-regret algorithms for the stochastic shortest path (SSP) problem
with a linear MDP that significantly improve over the only existing results of (Vial et al …

被引用次数：17 相关文章所有 4 个版本

[PDF] neurips.cc

Minimax regret for stochastic shortest path

A Cohen, Y Efroni, Y Mansour… - Advances in neural …, 2021 - proceedings.neurips.cc

Abstract We study the Stochastic Shortest Path (SSP) problem in which an agent has to
reach a goal state in minimum total expected cost. In the learning formulation of the problem …

被引用次数：32 相关文章所有 11 个版本

[PDF] neurips.cc

Implicit finite-horizon approximation and efficient optimal algorithms for stochastic shortest path

L Chen, M Jafarnia-Jahromi… - Advances in Neural …, 2021 - proceedings.neurips.cc

We introduce a generic template for developing regret minimization algorithms in the
Stochastic Shortest Path (SSP) model, which achieves minimax optimal regret as long as …

被引用次数：26 相关文章所有 8 个版本

高级搜索

QQ 群