D Zhou, Q Gu - Advances in neural information processing …, 2022 - proceedings.neurips.cc
Recent studies have shown that episodic reinforcement learning (RL) is not more difficult than bandits, even with a long planning horizon and unknown state transitions. However …
Z Zhang, Y Chen, JD Lee… - The Thirty Seventh Annual …, 2024 - proceedings.mlr.press
A central issue lying at the heart of online reinforcement learning (RL) is data efficiency. While a number of recent works achieved asymptotically minimal regret in online RL, the …
Z Zhang, X Ji, S Du - Conference on Learning Theory, 2022 - proceedings.mlr.press
This paper gives the first polynomial-time algorithm for tabular Markov Decision Processes (MDP) that enjoys a regret bound\emph {independent on the planning horizon}. Specifically …
Y Min, J He, T Wang, Q Gu - International Conference on …, 2022 - proceedings.mlr.press
We study the stochastic shortest path (SSP) problem in reinforcement learning with linear function approximation, where the transition kernel is represented as a linear mixture of …
R Zhou, Z Zihan, SS Du - International Conference on …, 2023 - proceedings.mlr.press
We study variance-dependent regret bounds for Markov decision processes (MDPs). Algorithms with variance-dependent regret guarantees can automatically exploit …
J Zhang, W Zhang, Q Gu - International Conference on …, 2023 - proceedings.mlr.press
We study reward-free reinforcement learning (RL) with linear function approximation, where the agent works in two phases:(1) in the exploration phase, the agent interacts with the …
L Chen, R Jain, H Luo - International Conference on …, 2022 - proceedings.mlr.press
We introduce two new no-regret algorithms for the stochastic shortest path (SSP) problem with a linear MDP that significantly improve over the only existing results of (Vial et al …
Abstract We study the Stochastic Shortest Path (SSP) problem in which an agent has to reach a goal state in minimum total expected cost. In the learning formulation of the problem …
We introduce a generic template for developing regret minimization algorithms in the Stochastic Shortest Path (SSP) model, which achieves minimax optimal regret as long as …