An overview of reinforcement learning-based approaches for smart home energy management systems with energy storages

W Pinthurat, T Surinkaew, B Hredzak - Renewable and Sustainable Energy …, 2024 - Elsevier
The paper's state-of-the-art review focuses on an in-depth evaluation of smart home energy
management systems which employ reinforcement learning-based methods to integrate …

Policy regularization with dataset constraint for offline reinforcement learning

Y Ran, YC Li, F Zhang, Z Zhang… - … Conference on Machine …, 2023 - proceedings.mlr.press
We consider the problem of learning the best possible policy from a fixed dataset, known as
offline Reinforcement Learning (RL). A common taxonomy of existing offline RL works is …

Off-policy average reward actor-critic with deterministic policy search

N Saxena, S Khastagir, S Kolathaya… - International …, 2023 - proceedings.mlr.press
The average reward criterion is relatively less studied as most existing works in the
Reinforcement Learning literature consider the discounted reward criterion. There are few …

Policy gradient algorithms implicitly optimize by continuation

A Bolland, G Louppe, D Ernst - arXiv preprint arXiv:2305.06851, 2023 - arxiv.org
Direct policy optimization in reinforcement learning is usually solved with policy-gradient
algorithms, which optimize policy parameters via stochastic gradient ascent. This paper …

DTN-assisted dynamic cooperative slicing for delay-sensitive service in MEC-enabled IoT via deep deterministic policy gradient with variable action

L Li, L Tang, Q Liu, Y Wang, X He… - IEEE Internet of Things …, 2023 - ieeexplore.ieee.org
Network slicing (NS) provides customized services to users of the Internet of Things (IoT) by
creating logical virtual networks, and NS combined with multiaccess edge computing (MEC) …

Learning Optimal Deterministic Policies with Stochastic Policy Gradients

A Montenegro, M Mussi, AM Metelli… - arXiv preprint arXiv …, 2024 - arxiv.org
Policy gradient (PG) methods are successful approaches to deal with continuous
reinforcement learning (RL) problems. They learn stochastic parametric (hyper) policies by …

Adaptive Advantage-Guided Policy Regularization for Offline Reinforcement Learning

T Liu, Y Li, Y Lan, H Gao, W Pan, X Xu - arXiv preprint arXiv:2405.19909, 2024 - arxiv.org
In offline reinforcement learning, the challenge of out-of-distribution (OOD) is pronounced.
To address this, existing methods often constrain the learned policy through policy …

Policy Learning based on Deep Koopman Representation

W Hao, PC Heredia, B Huang, Z Lu, Z Liang… - arXiv preprint arXiv …, 2023 - arxiv.org
This paper proposes a policy learning algorithm based on the Koopman operator theory and
policy gradient approach, which seeks to approximate an unknown dynamical system and …

Multi-Agent Interpolated Policy Gradients

Y Li, G Xie - openreview.net
Policy gradient method typically suffers high variance, which is further amplified in the multi-
agent setting due to the exponential explosive growth of the joint action space. While value …