No-regret exploration in goal-oriented reinforcement learning

Y Jiang, JZ Kolter, R Raileanu - Advances in Neural …, 2024 - proceedings.neurips.cc

Existing approaches for improving generalization in deep reinforcement learning (RL) have
mostly focused on representation learning, neglecting RL-specific aspects such as …

被引用次数：19 相关文章所有 5 个版本

[PDF] mlr.press

Adaptive reward-free exploration

E Kaufmann, P Ménard… - Algorithmic …, 2021 - proceedings.mlr.press

Reward-free exploration is a reinforcement learning setting recently studied by (Jin et al.
2020), who address it by running several algorithms with regret guarantees in parallel. In our …

被引用次数：96 相关文章所有 9 个版本

[PDF] mlr.press

A study of global and episodic bonuses for exploration in contextual mdps

M Henaff, M Jiang, R Raileanu - International Conference on …, 2023 - proceedings.mlr.press

Exploration in environments which differ across episodes has received increasing attention
in recent years. Current methods use some combination of global novelty bonuses …

被引用次数：16 相关文章所有 6 个版本

[PDF] mlr.press

Learning stochastic shortest path with linear function approximation

Y Min, J He, T Wang, Q Gu - International Conference on …, 2022 - proceedings.mlr.press

We study the stochastic shortest path (SSP) problem in reinforcement learning with linear
function approximation, where the transition kernel is represented as a linear mixture of …

被引用次数：33 相关文章所有 9 个版本

[PDF] arxiv.org

Geometric entropic exploration

ZD Guo, MG Azar, A Saade, S Thakoor, B Piot… - arXiv preprint arXiv …, 2021 - arxiv.org

Exploration is essential for solving complex Reinforcement Learning (RL) tasks. Maximum
State-Visitation Entropy (MSVE) formulates the exploration problem as a well-defined policy …

被引用次数：43 相关文章所有 2 个版本

[PDF] mlr.press

Near-optimal regret bounds for stochastic shortest path

A Rosenberg, A Cohen, Y Mansour… - … on Machine Learning, 2020 - proceedings.mlr.press

Stochastic shortest path (SSP) is a well-known problem in planning and control, in which an
agent has to reach a goal state in minimum total expected cost. In the learning formulation of …

被引用次数：58 相关文章所有 9 个版本

[PDF] neurips.cc

Stochastic shortest path: Minimax, parameter-free and towards horizon-free regret

J Tarbouriech, R Zhou, SS Du… - Advances in neural …, 2021 - proceedings.neurips.cc

We study the problem of learning in the stochastic shortest path (SSP) setting, where an
agent seeks to minimize the expected cost accumulated before reaching a goal state. We …

被引用次数：36 相关文章所有 13 个版本

[PDF] mlr.press

Minimax regret for stochastic shortest path with adversarial costs and known transition

L Chen, H Luo, CY Wei - Conference on Learning Theory, 2021 - proceedings.mlr.press

We study the stochastic shortest path problem with adversarial costs and known transition,
and show that the minimax regret is $ O (\sqrt {DT_\star K}) $ and $ O (\sqrt {DT_\star SA K}) …

被引用次数：37 相关文章所有 4 个版本

[PDF] mlr.press

Finding the stochastic shortest path with low regret: The adversarial cost and unknown transition case

L Chen, H Luo - International Conference on Machine …, 2021 - proceedings.mlr.press

We make significant progress toward the stochastic shortest path problem with adversarial
costs and unknown transition. Specifically, we develop algorithms that achieve $ O (\sqrt {S …

被引用次数：32 相关文章所有 6 个版本

[PDF] mlr.press

Improved no-regret algorithms for stochastic shortest path with linear mdp

L Chen, R Jain, H Luo - International Conference on …, 2022 - proceedings.mlr.press

We introduce two new no-regret algorithms for the stochastic shortest path (SSP) problem
with a linear MDP that significantly improve over the only existing results of (Vial et al …

被引用次数：17 相关文章所有 4 个版本

高级搜索

QQ 群