Deterministic policy gradient: Convergence analysis

An overview of reinforcement learning-based approaches for smart home energy management systems with energy storages

W Pinthurat, T Surinkaew, B Hredzak - Renewable and Sustainable Energy …, 2024 - Elsevier

The paper's state-of-the-art review focuses on an in-depth evaluation of smart home energy
management systems which employ reinforcement learning-based methods to integrate …

[PDF] mlr.press

Policy regularization with dataset constraint for offline reinforcement learning

Y Ran, YC Li, F Zhang, Z Zhang… - … Conference on Machine …, 2023 - proceedings.mlr.press

We consider the problem of learning the best possible policy from a fixed dataset, known as
offline Reinforcement Learning (RL). A common taxonomy of existing offline RL works is …

被引用次数：12 相关文章所有 6 个版本

[PDF] mlr.press

Off-policy average reward actor-critic with deterministic policy search

N Saxena, S Khastagir, S Kolathaya… - International …, 2023 - proceedings.mlr.press

The average reward criterion is relatively less studied as most existing works in the
Reinforcement Learning literature consider the discounted reward criterion. There are few …

被引用次数：4 相关文章所有 12 个版本

[PDF] arxiv.org

Policy gradient algorithms implicitly optimize by continuation

A Bolland, G Louppe, D Ernst - arXiv preprint arXiv:2305.06851, 2023 - arxiv.org

Direct policy optimization in reinforcement learning is usually solved with policy-gradient
algorithms, which optimize policy parameters via stochastic gradient ascent. This paper …

被引用次数：3 相关文章所有 6 个版本

DTN-assisted dynamic cooperative slicing for delay-sensitive service in MEC-enabled IoT via deep deterministic policy gradient with variable action

L Li, L Tang, Q Liu, Y Wang, X He… - IEEE Internet of Things …, 2023 - ieeexplore.ieee.org

Network slicing (NS) provides customized services to users of the Internet of Things (IoT) by
creating logical virtual networks, and NS combined with multiaccess edge computing (MEC) …

被引用次数：4 相关文章

[PDF] arxiv.org

Learning Optimal Deterministic Policies with Stochastic Policy Gradients

A Montenegro, M Mussi, AM Metelli… - arXiv preprint arXiv …, 2024 - arxiv.org

Policy gradient (PG) methods are successful approaches to deal with continuous
reinforcement learning (RL) problems. They learn stochastic parametric (hyper) policies by …

被引用次数：1 相关文章所有 3 个版本

[PDF] arxiv.org

高级搜索

QQ 群