Fast rates for maximum entropy exploration

D Tiapkin, D Belomestny… - International …, 2023 - proceedings.mlr.press
We address the challenge of exploration in reinforcement learning (RL) when the agent
operates in an unknown environment with sparse or no rewards. In this work, we study the …

Learning stochastic shortest path with linear function approximation

Y Min, J He, T Wang, Q Gu - International Conference on …, 2022 - proceedings.mlr.press
We study the stochastic shortest path (SSP) problem in reinforcement learning with linear
function approximation, where the transition kernel is represented as a linear mixture of …

Stochastic shortest path: Minimax, parameter-free and towards horizon-free regret

J Tarbouriech, R Zhou, SS Du… - Advances in neural …, 2021 - proceedings.neurips.cc
We study the problem of learning in the stochastic shortest path (SSP) setting, where an
agent seeks to minimize the expected cost accumulated before reaching a goal state. We …

Implicit finite-horizon approximation and efficient optimal algorithms for stochastic shortest path

L Chen, M Jafarnia-Jahromi… - Advances in Neural …, 2021 - proceedings.neurips.cc
We introduce a generic template for developing regret minimization algorithms in the
Stochastic Shortest Path (SSP) model, which achieves minimax optimal regret as long as …

A provably efficient sample collection strategy for reinforcement learning

J Tarbouriech, M Pirotta, M Valko… - Advances in Neural …, 2021 - proceedings.neurips.cc
One of the challenges in online reinforcement learning (RL) is that the agent needs to trade
off the exploration of the environment and the exploitation of the samples to optimize its …

Sample complexity bounds for stochastic shortest path with a generative model

J Tarbouriech, M Pirotta, M Valko… - Algorithmic Learning …, 2021 - proceedings.mlr.press
We consider the objective of computing an $\epsilon $-optimal policy in a stochastic shortest
path (SSP) setting, provided that we can access a generative sampling oracle. We propose …

Reaching goals is hard: Settling the sample complexity of the stochastic shortest path

L Chen, A Tirinzoni, M Pirotta… - … on Algorithmic Learning …, 2023 - proceedings.mlr.press
We study the sample complexity of learning an $\epsilon $-optimal policy in the Stochastic
Shortest Path (SSP) problem. We first derive sample complexity bounds when the learner …

Adaptive multi-goal exploration

J Tarbouriech, OD Domingues… - International …, 2022 - proceedings.mlr.press
We introduce a generic strategy for provably efficient multi-goal exploration. It relies on
AdaGoal, a novel goal selection scheme that leverages a measure of uncertainty in reaching …

Near-optimal algorithms for autonomous exploration and multi-goal stochastic shortest path

H Cai, T Ma, S Du - International Conference on Machine …, 2022 - proceedings.mlr.press
We revisit the incremental autonomous exploration problem proposed by Lim and Auer
(2012). In this setting, the agent aims to learn a set of near-optimal goal-conditioned policies …

Layered state discovery for incremental autonomous exploration

L Chen, A Tirinzoni, A Lazaric… - … Conference on Machine …, 2023 - proceedings.mlr.press
We study the autonomous exploration (AX) problem proposed by Lim & Auer (2012). In this
setting, the objective is to discover a set of $\epsilon $-optimal policies reaching a set …