Toward a theoretical foundation of policy optimization for learning control policies

B Hu, K Zhang, N Li, M Mesbahi… - Annual Review of …, 2023 - annualreviews.org
Gradient-based methods have been widely used for system design and optimization in
diverse application domains. Recently, there has been a renewed interest in studying …

Global Convergence of Direct Policy Search for State-Feedback Robust Control: A Revisit of Nonsmooth Synthesis with Goldstein Subdifferential

X Guo, B Hu - Advances in Neural Information Processing …, 2022 - proceedings.neurips.cc
Direct policy search has been widely applied in modern reinforcement learning and
continuous control. However, the theoretical properties of direct policy search on nonsmooth …

Rl-driven mppi: Accelerating online control laws calculation with offline policy

Y Qu, H Chu, S Gao, J Guan, H Yan… - IEEE Transactions …, 2023 - ieeexplore.ieee.org
Model Predictive Path Integral (MPPI) is a recognized sampling-based approach for finite
horizon optimal control problems. However, the efficacy and computational efficiency of …

Escaping high-order saddles in policy optimization for Linear Quadratic Gaussian (LQG) control

Y Zheng, Y Sun, M Fazel, N Li - 2022 IEEE 61st Conference on …, 2022 - ieeexplore.ieee.org
First-order policy optimization has been widely used in reinforcement learning. It guarantees
to find the optimal policy for the state-feedback linear quadratic regulator (LQR). However …

Connectivity of the feasible and sublevel sets of dynamic output feedback control with robustness constraints

B Hu, Y Zheng - IEEE Control Systems Letters, 2022 - ieeexplore.ieee.org
This letter considers the optimization landscape of linear dynamic output feedback control
with robustness constraints. We consider the feasible set of all the stabilizing full-order …

Sliding-Mode Control for Perturbed MIMO Systems With Time-Synchronized Convergence

W Jiang, SS Ge, Q Hu, D Li - IEEE Transactions on Cybernetics, 2023 - ieeexplore.ieee.org
This article introduces a novel approach called terminal sliding-mode control for achieving
time-synchronized convergence in multi-input–multi-output (MIMO) systems under …

Mixed policy gradient

Y Guan, J Duan, SE Li, J Li, J Chen… - arXiv preprint arXiv …, 2021 - arxiv.org
Reinforcement learning (RL) has great potential in sequential decision-making. At present,
the mainstream RL algorithms are data-driven, relying on millions of iterations and a large …

Benign nonconvex landscapes in optimal and robust control, Part I: Global optimality

Y Zheng, C Pai, Y Tang - arXiv preprint arXiv:2312.15332, 2023 - arxiv.org
Direct policy search has achieved great empirical success in reinforcement learning. Many
recent studies have revisited its theoretical foundation for continuous control, which reveals …

On the Global Optimality of Direct Policy Search for Nonsmooth Output-Feedback Control

Y Tang, Y Zheng - 2023 62nd IEEE Conference on Decision …, 2023 - ieeexplore.ieee.org
Direct policy search has achieved great empirical success in reinforcement learning.
Recently, there has been increasing interest in studying its theoretical properties for …

Optimization landscape of policy gradient methods for discrete-time static output feedback

J Duan, J Li, X Chen, K Zhao, SE Li… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
In recent times, significant advancements have been made in delving into the optimization
landscape of policy gradient methods for achieving optimal control in linear time-invariant …