Understanding the generalization benefit of normalization layers: Sharpness reduction

K Lyu, Z Li, S Arora - Advances in Neural Information …, 2022 - proceedings.neurips.cc
Abstract Normalization layers (eg, Batch Normalization, Layer Normalization) were
introduced to help with optimization difficulties in very deep nets, but they clearly also help …

Implicit bias of gradient descent on reparametrized models: On equivalence to mirror descent

Z Li, T Wang, JD Lee, S Arora - Advances in Neural …, 2022 - proceedings.neurips.cc
As part of the effort to understand implicit bias of gradient descent in overparametrized
models, several results have shown how the training trajectory on the overparametrized …

Incremental learning in diagonal linear networks

R Berthier - Journal of Machine Learning Research, 2023 - jmlr.org
Diagonal linear networks (DLNs) are a toy simpli_cation of artificial neural networks; they
consist in a quadratic reparametrization of linear regression inducing a sparse implicit …

Implicit regularization in hierarchical tensor factorization and deep convolutional neural networks

N Razin, A Maman, N Cohen - International Conference on …, 2022 - proceedings.mlr.press
In the pursuit of explaining implicit regularization in deep learning, prominent focus was
given to matrix and tensor factorizations, which correspond to simplified neural networks. It …

Tensor-on-tensor regression: Riemannian optimization, over-parameterization, statistical-computational gap, and their interplay

Y Luo, AR Zhang - arXiv preprint arXiv:2206.08756, 2022 - arxiv.org
We study the tensor-on-tensor regression, where the goal is to connect tensor responses to
tensor covariates with a low Tucker rank parameter tensor/matrix without the prior …

What Makes Data Suitable for a Locally Connected Neural Network? A Necessary and Sufficient Condition Based on Quantum Entanglement.

N De La Vega, N Razin… - Advances in Neural …, 2024 - proceedings.neurips.cc
The question of what makes a data distribution suitable for deep learning is a fundamental
open problem. Focusing on locally connected neural networks (a prevalent family of …

Algorithmic regularization in tensor optimization: towards a lifted approach in matrix sensing

Z Ma, J Lavaei, S Sojoudi - Advances in Neural Information …, 2024 - proceedings.neurips.cc
Gradient descent (GD) is crucial for generalization in machine learning models, as it induces
implicit regularization, promoting compact representations. In this work, we examine the role …

Plateau in Monotonic Linear Interpolation---A" Biased" View of Loss Landscape for Deep Networks

X Wang, AN Wang, M Zhou, R Ge - The Eleventh International …, 2022 - openreview.net
Monotonic linear interpolation (MLI)---on the line connecting a random initialization with the
minimizer it converges to, the loss and accuracy are monotonic---is a phenomenon that is …

Behind the scenes of gradient descent: A trajectory analysis via basis function decomposition

J Ma, L Guo, S Fattahi - arXiv preprint arXiv:2210.00346, 2022 - arxiv.org
This work analyzes the solution trajectory of gradient-based algorithms via a novel basis
function decomposition. We show that, although solution trajectories of gradient-based …

Learning from low rank tensor data: A random tensor theory perspective

MEA Seddik, M Tiomoko… - Uncertainty in …, 2023 - proceedings.mlr.press
Under a simplified data model, this paper provides a theoretical analysis of learning from
data that have an underlying low-rank tensor structure in both supervised and unsupervised …