Reward-free exploration for reinforcement learning

K Khetarpal, M Riemer, I Rish, D Precup - Journal of Artificial Intelligence …, 2022 - jair.org

In this article, we aim to provide a literature review of different formulations and approaches
to continual reinforcement learning (RL), also known as lifelong or non-stationary RL. We …

被引用次数：277 相关文章所有 9 个版本

[PDF] arxiv.org

The statistical complexity of interactive decision making

DJ Foster, SM Kakade, J Qian, A Rakhlin - arXiv preprint arXiv:2112.13487, 2021 - arxiv.org

A fundamental challenge in interactive learning and decision making, ranging from bandit
problems to reinforcement learning, is to provide sample-efficient, adaptive learning …

被引用次数：176 相关文章所有 6 个版本

[PDF] neurips.cc

Is rlhf more difficult than standard rl? a theoretical perspective

Y Wang, Q Liu, C Jin - Advances in Neural Information …, 2023 - proceedings.neurips.cc

Abstract Reinforcement learning from Human Feedback (RLHF) learns from preference
signals, while standard Reinforcement Learning (RL) directly learns from reward signals …

被引用次数：15 相关文章所有 4 个版本

[PDF] neurips.cc

Flambe: Structural complexity and representation learning of low rank mdps

A Agarwal, S Kakade… - Advances in neural …, 2020 - proceedings.neurips.cc

In order to deal with the curse of dimensionality in reinforcement learning (RL), it is common
practice to make parametric assumptions where values or policies are functions of some low …

被引用次数：265 相关文章所有 10 个版本

[PDF] neurips.cc

Byol-explore: Exploration by bootstrapped prediction

Z Guo, S Thakoor, M Pîslar… - Advances in neural …, 2022 - proceedings.neurips.cc

We present BYOL-Explore, a conceptually simple yet general approach for curiosity-driven
exploration in visually complex environments. BYOL-Explore learns the world …

被引用次数：60 相关文章所有 5 个版本

[PDF] neurips.cc

The curious price of distributional robustness in reinforcement learning with a generative model

L Shi, G Li, Y Wei, Y Chen… - Advances in Neural …, 2024 - proceedings.neurips.cc

This paper investigates model robustness in reinforcement learning (RL) via the framework
of distributionally robust Markov decision processes (RMDPs). Despite recent efforts, the …

被引用次数：27 相关文章所有 10 个版本

[PDF] mlr.press

A sharp analysis of model-based reinforcement learning with self-play

Q Liu, T Yu, Y Bai, C Jin - International Conference on …, 2021 - proceedings.mlr.press

Abstract Model-based algorithms—algorithms that explore the environment through building
and utilizing an estimated model—are widely used in reinforcement learning practice and …

被引用次数：146 相关文章所有 6 个版本

[PDF] mlr.press

The complexity of markov equilibrium in stochastic games

C Daskalakis, N Golowich… - The Thirty Sixth Annual …, 2023 - proceedings.mlr.press

We show that computing approximate stationary Markov coarse correlated equilibria (CCE)
in general-sum stochastic games is PPAD-hard, even when there are two players, the game …

被引用次数：70 相关文章所有 6 个版本

[PDF] neurips.cc

Model-based multi-agent rl in zero-sum markov games with near-optimal sample complexity

K Zhang, S Kakade, T Basar… - Advances in Neural …, 2020 - proceedings.neurips.cc

Abstract Model-based reinforcement learning (RL), which finds an optimal policy using an
empirical model, has long been recognized as one of the cornerstones of RL. It is especially …

被引用次数：143 相关文章所有 12 个版本

[PDF] arxiv.org

The role of coverage in online reinforcement learning

T Xie, DJ Foster, Y Bai, N Jiang, SM Kakade - arXiv preprint arXiv …, 2022 - arxiv.org

Coverage conditions--which assert that the data logging distribution adequately covers the
state space--play a fundamental role in determining the sample complexity of offline …

被引用次数：60 相关文章所有 4 个版本

高级搜索

QQ 群