A provably efficient sample collection strategy for reinforcement learning

AJ Wagenmaker, Y Chen… - International …, 2022 - proceedings.mlr.press

Reward-free reinforcement learning (RL) considers the setting where the agent does not
have access to a reward function during exploration, but must propose a near-optimal policy …

被引用次数：67 相关文章所有 7 个版本

[PDF] mlr.press

Fast rates for maximum entropy exploration

D Tiapkin, D Belomestny… - International …, 2023 - proceedings.mlr.press

We address the challenge of exploration in reinforcement learning (RL) when the agent
operates in an unknown environment with sparse or no rewards. In this work, we study the …

被引用次数：15 相关文章所有 9 个版本

[PDF] mlr.press

The importance of non-markovianity in maximum state entropy exploration

M Mutti, R De Santi, M Restelli - International Conference on …, 2022 - proceedings.mlr.press

In the maximum state entropy exploration framework, an agent interacts with a reward-free
environment to learn a policy that maximizes the entropy of the expected state visitations it is …

被引用次数：30 相关文章所有 7 个版本

[PDF] mlr.press

Finding the stochastic shortest path with low regret: The adversarial cost and unknown transition case

L Chen, H Luo - International Conference on Machine …, 2021 - proceedings.mlr.press

We make significant progress toward the stochastic shortest path problem with adversarial
costs and unknown transition. Specifically, we develop algorithms that achieve $ O (\sqrt {S …

被引用次数：32 相关文章所有 6 个版本

[PDF] mlr.press

Active coverage for pac reinforcement learning

A Al-Marjani, A Tirinzoni… - The Thirty Sixth Annual …, 2023 - proceedings.mlr.press

Collecting and leveraging data with good coverage properties plays a crucial role in different
aspects of reinforcement learning (RL), including reward-free exploration and offline …

被引用次数：8 相关文章所有 8 个版本

[PDF] mlr.press

Nearly optimal latent state decoding in block mdps

Y Jedra, J Lee, A Proutiere… - … Conference on Artificial …, 2023 - proceedings.mlr.press

We consider the problem of model estimation in episodic Block MDPs. In these MDPs, the
decision maker has access to rich observations or contexts generated from a small number …

被引用次数：12 相关文章所有 6 个版本

[PDF] mlr.press

Reaching goals is hard: Settling the sample complexity of the stochastic shortest path

L Chen, A Tirinzoni, M Pirotta… - … on Algorithmic Learning …, 2023 - proceedings.mlr.press

We study the sample complexity of learning an $\epsilon $-optimal policy in the Stochastic
Shortest Path (SSP) problem. We first derive sample complexity bounds when the learner …

被引用次数：3 相关文章所有 3 个版本

[PDF] mlr.press

Adaptive multi-goal exploration

J Tarbouriech, OD Domingues… - International …, 2022 - proceedings.mlr.press

We introduce a generic strategy for provably efficient multi-goal exploration. It relies on
AdaGoal, a novel goal selection scheme that leverages a measure of uncertainty in reaching …

被引用次数：4 相关文章所有 4 个版本

[PDF] unibo.it

Unsupervised reinforcement learning via state entropy maximization

M Mutti - 2023 - amsdottorato.unibo.it

Reinforcement Learning (RL) provides a powerful framework to address sequential decision-
making problems in which the transition dynamics is unknown or too complex to be …

被引用次数：3 相关文章所有 2 个版本

[PDF] openreview.net

Online Regret Bounds for Satisficing in MDPs

H Hajiabolhassan, R Ortner - Sixteenth European Workshop on …, 2023 - openreview.net

We consider general reinforcement learning under the average reward criterion in Markov
decision processes (MDPs) when the learner's goal is not to learn an optimal policy but …

被引用次数：3 相关文章所有 2 个版本

高级搜索

QQ 群