prevalent in training over-parameterized deep neural networks for classification tasks.
Existing work has shown that any optimal solution of the trained problem for classification
tasks is an NC solution and has a benign landscape under the unconstrained feature model.
However, these results do not provide an answer to the question of how quickly gradient
descent can find an NC solution. To answer this question, we prove an error bound property …