Policy optimization for continuous reinforcement learning

H Zhao, W Tang, D Yao - Advances in Neural Information …, 2024 - proceedings.neurips.cc
We study reinforcement learning (RL) in the setting of continuous time and space, for an
infinite horizon with a discounted objective and the underlying dynamics driven by a …

Average-reward reinforcement learning with trust region methods

X Ma, X Tang, L Xia, J Yang, Q Zhao - arXiv preprint arXiv:2106.03442, 2021 - arxiv.org
Most of reinforcement learning algorithms optimize the discounted criterion which is
beneficial to accelerate the convergence and reduce the variance of estimates. Although the …

Processing Network Controls via Deep Reinforcement Learning

M Gluzman - 2022 - search.proquest.com
Novel advanced policy gradient (APG) algorithms, such as proximal policy optimization
(PPO), trust region policy optimization, and their variations, have become the dominant …