Nearly minimax optimal reinforcement learning for discounted MDPs

S Gu, L Yang, Y Du, G Chen, F Walter, J Wang… - arXiv preprint arXiv …, 2022 - arxiv.org

Reinforcement Learning (RL) has achieved tremendous success in many complex decision-
making tasks. However, safety concerns are raised during deploying RL in real-world …

被引用次数：295 相关文章所有 2 个版本

[PDF] kcl.ac.uk

A review of safe reinforcement learning: Methods, theories and applications

S Gu, L Yang, Y Du, G Chen, F Walter… - … on Pattern Analysis …, 2024 - ieeexplore.ieee.org

Reinforcement Learning (RL) has achieved tremendous success in many complex decision-
making tasks. However, safety concerns are raised during deploying RL in real-world …

被引用次数：11 相关文章所有 8 个版本

[PDF] mlr.press

Nearly minimax optimal reinforcement learning for linear markov decision processes

J He, H Zhao, D Zhou, Q Gu - International Conference on …, 2023 - proceedings.mlr.press

We study reinforcement learning (RL) with linear function approximation. For episodic time-
inhomogeneous linear Markov decision processes (linear MDPs) whose transition …

被引用次数：59 相关文章所有 7 个版本

[PDF] arxiv.org

Settling the sample complexity of model-based offline reinforcement learning

G Li, L Shi, Y Chen, Y Chi, Y Wei - The Annals of Statistics, 2024 - projecteuclid.org

Settling the sample complexity of model-based offline reinforcement learning Page 1 The
Annals of Statistics 2024, Vol. 52, No. 1, 233–260 https://doi.org/10.1214/23-AOS2342 © …

被引用次数：89 相关文章所有 8 个版本

[PDF] neurips.cc

Breaking the sample complexity barrier to regret-optimal model-free reinforcement learning

G Li, L Shi, Y Chen, Y Gu, Y Chi - Advances in Neural …, 2021 - proceedings.neurips.cc

Achieving sample efficiency in online episodic reinforcement learning (RL) requires
optimally balancing exploration and exploitation. When it comes to a finite-horizon episodic …

被引用次数：59 相关文章所有 13 个版本

[PDF] neurips.cc

Made: Exploration via maximizing deviation from explored regions

T Zhang, P Rashidinejad, J Jiao… - Advances in …, 2021 - proceedings.neurips.cc

In online reinforcement learning (RL), efficient exploration remains particularly challenging
in high-dimensional environments with sparse rewards. In low-dimensional environments …

被引用次数：49 相关文章所有 7 个版本

[PDF] mlr.press

Learning stochastic shortest path with linear function approximation

Y Min, J He, T Wang, Q Gu - International Conference on …, 2022 - proceedings.mlr.press

We study the stochastic shortest path (SSP) problem in reinforcement learning with linear
function approximation, where the transition kernel is represented as a linear mixture of …

被引用次数：33 相关文章所有 9 个版本

[PDF] neurips.cc

Learning adversarial low-rank markov decision processes with unknown transition and full-information feedback

C Zhao, R Yang, B Wang… - Advances in Neural …, 2024 - proceedings.neurips.cc

In this work, we study the low-rank MDPs with adversarially changed losses in the full-
information feedback setting. In particular, the unknown transition probability kernel admits a …

被引用次数：4 相关文章所有 6 个版本

[PDF] mlr.press

On the sample complexity of learning infinite-horizon discounted linear kernel MDPs

Y Chen, J He, Q Gu - International Conference on Machine …, 2022 - proceedings.mlr.press

We study reinforcement learning for infinite-horizon discounted linear kernel MDPs, where
the transition probability function is linear in a predefined feature mapping. Existing …

被引用次数：8 相关文章所有 4 个版本

[PDF] arxiv.org

A nearly optimal and low-switching algorithm for reinforcement learning with general function approximation

H Zhao, J He, Q Gu - arXiv preprint arXiv:2311.15238, 2023 - arxiv.org

The exploration-exploitation dilemma has been a central challenge in reinforcement
learning (RL) with complex model classes. In this paper, we propose a new algorithm …

被引用次数：6 相关文章所有 3 个版本

高级搜索

QQ 群