Reward-free exploration is a reinforcement learning setting recently studied by (Jin et al. 2020), who address it by running several algorithms with regret guarantees in parallel. In our …
M Henaff, M Jiang, R Raileanu - International Conference on …, 2023 - proceedings.mlr.press
Exploration in environments which differ across episodes has received increasing attention in recent years. Current methods use some combination of global novelty bonuses …
Y Min, J He, T Wang, Q Gu - International Conference on …, 2022 - proceedings.mlr.press
We study the stochastic shortest path (SSP) problem in reinforcement learning with linear function approximation, where the transition kernel is represented as a linear mixture of …
Exploration is essential for solving complex Reinforcement Learning (RL) tasks. Maximum State-Visitation Entropy (MSVE) formulates the exploration problem as a well-defined policy …
Stochastic shortest path (SSP) is a well-known problem in planning and control, in which an agent has to reach a goal state in minimum total expected cost. In the learning formulation of …
We study the problem of learning in the stochastic shortest path (SSP) setting, where an agent seeks to minimize the expected cost accumulated before reaching a goal state. We …
L Chen, H Luo, CY Wei - Conference on Learning Theory, 2021 - proceedings.mlr.press
We study the stochastic shortest path problem with adversarial costs and known transition, and show that the minimax regret is $ O (\sqrt {DT_\star K}) $ and $ O (\sqrt {DT_\star SA K}) …
L Chen, H Luo - International Conference on Machine …, 2021 - proceedings.mlr.press
We make significant progress toward the stochastic shortest path problem with adversarial costs and unknown transition. Specifically, we develop algorithms that achieve $ O (\sqrt {S …
L Chen, R Jain, H Luo - International Conference on …, 2022 - proceedings.mlr.press
We introduce two new no-regret algorithms for the stochastic shortest path (SSP) problem with a linear MDP that significantly improve over the only existing results of (Vial et al …