Refined policy improvement bounds for mdps

文章

学术资源搜索

获得 3 条结果（用时0.02秒）

我的图书馆

Refined policy improvement bounds for mdps

在引用文章中搜索

[PDF] neurips.cc

Policy optimization for continuous reinforcement learning

H Zhao, W Tang, D Yao - Advances in Neural Information …, 2024 - proceedings.neurips.cc

We study reinforcement learning (RL) in the setting of continuous time and space, for an
infinite horizon with a discounted objective and the underlying dynamics driven by a …

被引用次数：18 相关文章所有 6 个版本

[PDF] arxiv.org

Average-reward reinforcement learning with trust region methods

X Ma, X Tang, L Xia, J Yang, Q Zhao - arXiv preprint arXiv:2106.03442, 2021 - arxiv.org

Most of reinforcement learning algorithms optimize the discounted criterion which is
beneficial to accelerate the convergence and reduce the variance of estimates. Although the …

被引用次数：19 相关文章所有 5 个版本

[PDF] arxiv.org

Processing Network Controls via Deep Reinforcement Learning

M Gluzman - 2022 - search.proquest.com

Novel advanced policy gradient (APG) algorithms, such as proximal policy optimization
(PPO), trust region policy optimization, and their variations, have become the dominant …

高级搜索

QQ 群

Refined policy improvement bounds for mdps

Policy optimization for continuous reinforcement learning

Average-reward reinforcement learning with trust region methods

Processing Network Controls via Deep Reinforcement Learning

引用