[HTML][HTML] Toward a theoretical foundation of policy optimization for learning control policies

B Hu, K Zhang, N Li, M Mesbahi… - Annual Review of …, 2023 - annualreviews.org
Gradient-based methods have been widely used for system design and optimization in
diverse application domains. Recently, there has been a renewed interest in studying …

Global Convergence of Direct Policy Search for State-Feedback Robust Control: A Revisit of Nonsmooth Synthesis with Goldstein Subdifferential

X Guo, B Hu - Advances in Neural Information Processing …, 2022 - proceedings.neurips.cc
Direct policy search has been widely applied in modern reinforcement learning and
continuous control. However, the theoretical properties of direct policy search on nonsmooth …

Rl-driven mppi: Accelerating online control laws calculation with offline policy

Y Qu, H Chu, S Gao, J Guan, H Yan… - IEEE Transactions …, 2023 - ieeexplore.ieee.org
Model Predictive Path Integral (MPPI) is a recognized sampling-based approach for finite
horizon optimal control problems. However, the efficacy and computational efficiency of …

Escaping high-order saddles in policy optimization for Linear Quadratic Gaussian (LQG) control

Y Zheng, Y Sun, M Fazel, N Li - 2022 IEEE 61st Conference on …, 2022 - ieeexplore.ieee.org
First-order policy optimization has been widely used in reinforcement learning. It guarantees
to find the optimal policy for the state-feedback linear quadratic regulator (LQR). However …

Controlgym: Large-scale control environments for benchmarking reinforcement learning algorithms

X Zhang, W Mao, S Mowlavi… - 6th Annual Learning …, 2024 - proceedings.mlr.press
We introduce controlgym, a library of thirty-six industrial control settings, and ten infinite-
dimensional partial differential equation (PDE)-based control problems. Integrated within the …

Sliding-Mode Control for Perturbed MIMO Systems With Time-Synchronized Convergence

W Jiang, SS Ge, Q Hu, D Li - IEEE Transactions on Cybernetics, 2023 - ieeexplore.ieee.org
This article introduces a novel approach called terminal sliding-mode control for achieving
time-synchronized convergence in multi-input–multi-output (MIMO) systems under …

Connectivity of the feasible and sublevel sets of dynamic output feedback control with robustness constraints

B Hu, Y Zheng - IEEE Control Systems Letters, 2022 - ieeexplore.ieee.org
This letter considers the optimization landscape of linear dynamic output feedback control
with robustness constraints. We consider the feasible set of all the stabilizing full-order …

Policy gradient methods for designing dynamic output feedback controllers

T Sadamoto, T Hirai - European Journal of Control, 2024 - Elsevier
This paper proposes model-based and model-free policy gradient methods (PGMs) for
designing dynamic output feedback controllers for discrete-time partially observable …

Mixed policy gradient

Y Guan, J Duan, SE Li, J Li, J Chen… - arXiv preprint arXiv …, 2021 - arxiv.org
Reinforcement learning (RL) has great potential in sequential decision-making. At present,
the mainstream RL algorithms are data-driven, relying on millions of iterations and a large …

Benign nonconvex landscapes in optimal and robust control, Part I: Global optimality

Y Zheng, C Pai, Y Tang - arXiv preprint arXiv:2312.15332, 2023 - arxiv.org
Direct policy search has achieved great empirical success in reinforcement learning. Many
recent studies have revisited its theoretical foundation for continuous control, which reveals …