Reward-free rl is no harder than reward-aware rl in linear markov decision processes

AJ Wagenmaker, Y Chen… - International …, 2022 - proceedings.mlr.press
Reward-free reinforcement learning (RL) considers the setting where the agent does not
have access to a reward function during exploration, but must propose a near-optimal policy …

Fast rates for maximum entropy exploration

D Tiapkin, D Belomestny… - International …, 2023 - proceedings.mlr.press
We address the challenge of exploration in reinforcement learning (RL) when the agent
operates in an unknown environment with sparse or no rewards. In this work, we study the …

The importance of non-markovianity in maximum state entropy exploration

M Mutti, R De Santi, M Restelli - International Conference on …, 2022 - proceedings.mlr.press
In the maximum state entropy exploration framework, an agent interacts with a reward-free
environment to learn a policy that maximizes the entropy of the expected state visitations it is …

Finding the stochastic shortest path with low regret: The adversarial cost and unknown transition case

L Chen, H Luo - International Conference on Machine …, 2021 - proceedings.mlr.press
We make significant progress toward the stochastic shortest path problem with adversarial
costs and unknown transition. Specifically, we develop algorithms that achieve $ O (\sqrt {S …

Active coverage for pac reinforcement learning

A Al-Marjani, A Tirinzoni… - The Thirty Sixth Annual …, 2023 - proceedings.mlr.press
Collecting and leveraging data with good coverage properties plays a crucial role in different
aspects of reinforcement learning (RL), including reward-free exploration and offline …

Nearly optimal latent state decoding in block mdps

Y Jedra, J Lee, A Proutiere… - … Conference on Artificial …, 2023 - proceedings.mlr.press
We consider the problem of model estimation in episodic Block MDPs. In these MDPs, the
decision maker has access to rich observations or contexts generated from a small number …

Reaching goals is hard: Settling the sample complexity of the stochastic shortest path

L Chen, A Tirinzoni, M Pirotta… - … on Algorithmic Learning …, 2023 - proceedings.mlr.press
We study the sample complexity of learning an $\epsilon $-optimal policy in the Stochastic
Shortest Path (SSP) problem. We first derive sample complexity bounds when the learner …

Adaptive multi-goal exploration

J Tarbouriech, OD Domingues… - International …, 2022 - proceedings.mlr.press
We introduce a generic strategy for provably efficient multi-goal exploration. It relies on
AdaGoal, a novel goal selection scheme that leverages a measure of uncertainty in reaching …

Unsupervised reinforcement learning via state entropy maximization

M Mutti - 2023 - amsdottorato.unibo.it
Reinforcement Learning (RL) provides a powerful framework to address sequential decision-
making problems in which the transition dynamics is unknown or too complex to be …

Online Regret Bounds for Satisficing in MDPs

H Hajiabolhassan, R Ortner - Sixteenth European Workshop on …, 2023 - openreview.net
We consider general reinforcement learning under the average reward criterion in Markov
decision processes (MDPs) when the learner's goal is not to learn an optimal policy but …