Toward a theoretical foundation of policy optimization for learning control policies

B Hu, K Zhang, N Li, M Mesbahi… - Annual Review of …, 2023 - annualreviews.org
Gradient-based methods have been widely used for system design and optimization in
diverse application domains. Recently, there has been a renewed interest in studying …

Global Convergence of Direct Policy Search for State-Feedback Robust Control: A Revisit of Nonsmooth Synthesis with Goldstein Subdifferential

X Guo, B Hu - Advances in Neural Information Processing …, 2022 - proceedings.neurips.cc
Direct policy search has been widely applied in modern reinforcement learning and
continuous control. However, the theoretical properties of direct policy search on nonsmooth …

On the optimization landscape of dynamic output feedback linear quadratic control

J Duan, W Cao, Y Zheng, L Zhao - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
The convergence of policy gradient algorithms hinges on the optimization landscape of the
underlying optimal control problem. Theoretical insights into these algorithms can often be …

Controlgym: Large-scale control environments for benchmarking reinforcement learning algorithms

X Zhang, W Mao, S Mowlavi… - 6th Annual Learning …, 2024 - proceedings.mlr.press
We introduce controlgym, a library of thirty-six industrial control settings, and ten infinite-
dimensional partial differential equation (PDE)-based control problems. Integrated within the …

Global convergence of two-timescale actor-critic for solving linear quadratic regulator

X Chen, J Duan, Y Liang, L Zhao - … of the AAAI Conference on Artificial …, 2023 - ojs.aaai.org
The actor-critic (AC) reinforcement learning algorithms have been the powerhouse behind
many challenging applications. Nevertheless, its convergence is fragile in general. To study …

Model-Free -Synthesis: A Nonsmooth Optimization Perspective

D Keivan, X Guo, P Seiler, G Dullerud, B Hu - arXiv preprint arXiv …, 2024 - arxiv.org
In this paper, we revisit model-free policy search on an important robust control benchmark,
namely $\mu $-synthesis. In the general output-feedback setting, there do not exist convex …

A non-iterative approach to linear quadratic static output feedback

HM Escamilla, P Trodden, V Kadirkamanathan - IFAC-PapersOnLine, 2023 - Elsevier
This paper considers the problem of static output feedback (SOF) synthesis for linear time-
invariant (LTI) systems. Static output feedback, and more generally structured controller …

Two-Timescale Optimization Framework for Decentralized Linear-Quadratic Optimal Control

L Feng, YH Ni, X Zhang - arXiv preprint arXiv:2406.11168, 2024 - arxiv.org
This study investigates a decentralized linear-quadratic optimal control problem, and several
approximate separable constrained optimization problems are formulated for the first time …

Accelerated Optimization Landscape of Linear-Quadratic Regulator

L Feng, YH Ni - arXiv preprint arXiv:2307.03590, 2023 - arxiv.org
Linear-quadratic regulator (LQR) is a landmark problem in the field of optimal control, which
is the concern of this paper. Generally, LQR is classified into state-feedback LQR (SLQR) …

Globally Convergent Policy Gradient Methods for Linear Quadratic Control of Partially Observed Systems

F Zhao, X Fu, K You - IFAC-PapersOnLine, 2023 - Elsevier
While the optimization landscape of policy gradient methods has been recently investigated
for partially observed linear systems in terms of both static output feedback and dynamical …