Y Zheng, Y Sun, M Fazel, N Li - IEEE 61st Conference on Decision and …, 2022 - par.nsf.gov
First-order policy optimization has been widely used in reinforcement learning. It guarantees
to find the optimal policy for the state-feedback linear quadratic regulator (LQR). However …