Toward a theoretical foundation of policy optimization for learning control policies

B Hu, K Zhang, N Li, M Mesbahi… - Annual Review of …, 2023 - annualreviews.org
Gradient-based methods have been widely used for system design and optimization in
diverse application domains. Recently, there has been a renewed interest in studying …

Nonconvex optimization meets low-rank matrix factorization: An overview

Y Chi, YM Lu, Y Chen - IEEE Transactions on Signal …, 2019 - ieeexplore.ieee.org
Substantial progress has been made recently on developing provably accurate and efficient
algorithms for low-rank matrix factorization via nonconvex optimization. While conventional …

Gradient starvation: A learning proclivity in neural networks

M Pezeshki, O Kaba, Y Bengio… - Advances in …, 2021 - proceedings.neurips.cc
We identify and formalize a fundamental gradient descent phenomenon resulting in a
learning proclivity in over-parameterized neural networks. Gradient Starvation arises when …

Implicit regularization in deep matrix factorization

S Arora, N Cohen, W Hu, Y Luo - Advances in Neural …, 2019 - proceedings.neurips.cc
Efforts to understand the generalization mystery in deep learning have led to the belief that
gradient-based optimization induces a form of implicit regularization, a bias towards models …

Learning overparameterized neural networks via stochastic gradient descent on structured data

Y Li, Y Liang - Advances in neural information processing …, 2018 - proceedings.neurips.cc
Neural networks have many successful applications, while much less theoretical
understanding has been gained. Towards bridging this gap, we study the problem of …

Lower bounds for non-convex stochastic optimization

Y Arjevani, Y Carmon, JC Duchi, DJ Foster… - Mathematical …, 2023 - Springer
We lower bound the complexity of finding ϵ-stationary points (with gradient norm at most ϵ)
using stochastic first-order methods. In a well-studied model where algorithms access …

Spectral methods for data science: A statistical perspective

Y Chen, Y Chi, J Fan, C Ma - Foundations and Trends® in …, 2021 - nowpublishers.com
Spectral methods have emerged as a simple yet surprisingly effective approach for
extracting information from massive, noisy and incomplete data. In a nutshell, spectral …

Gradient descent maximizes the margin of homogeneous neural networks

K Lyu, J Li - arXiv preprint arXiv:1906.05890, 2019 - arxiv.org
In this paper, we study the implicit regularization of the gradient descent algorithm in
homogeneous neural networks, including fully-connected and convolutional neural …

[HTML][HTML] Entrywise eigenvector analysis of random matrices with low expected rank

E Abbe, J Fan, K Wang, Y Zhong - Annals of statistics, 2020 - ncbi.nlm.nih.gov
Recovering low-rank structures via eigenvector perturbation analysis is a common problem
in statistical machine learning, such as in factor analysis, community detection, ranking …

MARINA: Faster non-convex distributed learning with compression

E Gorbunov, KP Burlachenko, Z Li… - … on Machine Learning, 2021 - proceedings.mlr.press
We develop and analyze MARINA: a new communication efficient method for non-convex
distributed learning over heterogeneous datasets. MARINA employs a novel communication …