Pc-pg: Policy cover directed exploration for provable policy gradient learning

S Du, S Kakade, J Lee, S Lovett… - International …, 2021 - proceedings.mlr.press

Abstract This work introduces Bilinear Classes, a new structural framework, which permit
generalization in reinforcement learning in a wide variety of settings through the use of …

被引用次数：243 相关文章所有 8 个版本

[PDF] mlr.press

Jump-start reinforcement learning

I Uchendu, T Xiao, Y Lu, B Zhu, M Yan… - International …, 2023 - proceedings.mlr.press

Reinforcement learning (RL) provides a theoretical framework for continuously improving an
agent's behavior via trial and error. However, efficiently learning policies from scratch can be …

被引用次数：124 相关文章所有 10 个版本

[PDF] arxiv.org

Representation learning for online and offline rl in low-rank mdps

M Uehara, X Zhang, W Sun - arXiv preprint arXiv:2110.04652, 2021 - arxiv.org

This work studies the question of Representation Learning in RL: how can we learn a
compact low-dimensional representation such that on top of the representation we can …

被引用次数：156 相关文章所有 3 个版本

[PDF] arxiv.org

Pessimistic model-based offline reinforcement learning under partial coverage

M Uehara, W Sun - arXiv preprint arXiv:2107.06226, 2021 - arxiv.org

We study model-based offline Reinforcement Learning with general function approximation
without a full coverage assumption on the offline data distribution. We present an algorithm …

被引用次数：164 相关文章所有 4 个版本

[PDF] neurips.cc

Flambe: Structural complexity and representation learning of low rank mdps

A Agarwal, S Kakade… - Advances in neural …, 2020 - proceedings.neurips.cc

In order to deal with the curse of dimensionality in reinforcement learning (RL), it is common
practice to make parametric assumptions where values or policies are functions of some low …

被引用次数：294 相关文章所有 10 个版本

[PDF] mlr.press

The complexity of markov equilibrium in stochastic games

C Daskalakis, N Golowich… - The Thirty Sixth Annual …, 2023 - proceedings.mlr.press

We show that computing approximate stationary Markov coarse correlated equilibria (CCE)
in general-sum stochastic games is PPAD-hard, even when there are two players, the game …

被引用次数：79 相关文章所有 6 个版本

[PDF] neurips.cc

Unpacking reward shaping: Understanding the benefits of reward engineering on sample complexity

A Gupta, A Pacchiano, Y Zhai… - Advances in Neural …, 2022 - proceedings.neurips.cc

The success of reinforcement learning in a variety of challenging sequential decision-
making problems has been much discussed, but often ignored in this discussion is the …

被引用次数：65 相关文章所有 8 个版本

[PDF] mlr.press

Efficient reinforcement learning in block mdps: A model-free representation learning approach

X Zhang, Y Song, M Uehara, M Wang… - International …, 2022 - proceedings.mlr.press

We present BRIEE, an algorithm for efficient reinforcement learning in Markov Decision
Processes with block-structured dynamics (ie, Block MDPs), where rich observations are …

被引用次数：70 相关文章所有 4 个版本

[PDF] neurips.cc

Mitigating covariate shift in imitation learning via offline data with partial coverage

J Chang, M Uehara, D Sreenivas… - Advances in Neural …, 2021 - proceedings.neurips.cc

This paper studies offline Imitation Learning (IL) where an agent learns to imitate an expert
demonstrator without additional online environment interactions. Instead, the learner is …

被引用次数：103 相关文章所有 7 个版本

[PDF] neurips.cc

Noveld: A simple yet effective exploration criterion

T Zhang, H Xu, X Wang, Y Wu… - Advances in …, 2021 - proceedings.neurips.cc

Efficient exploration under sparse rewards remains a key challenge in deep reinforcement
learning. Previous exploration methods (eg, RND) have achieved strong results in multiple …

被引用次数：81 相关文章所有 6 个版本

高级搜索

QQ 群