Z Li, T Wang, JD Lee, S Arora - Advances in Neural …, 2022 - proceedings.neurips.cc
As part of the effort to understand implicit bias of gradient descent in overparametrized models, several results have shown how the training trajectory on the overparametrized …
R Berthier - Journal of Machine Learning Research, 2023 - jmlr.org
Diagonal linear networks (DLNs) are a toy simpli_cation of artificial neural networks; they consist in a quadratic reparametrization of linear regression inducing a sparse implicit …
N Razin, A Maman, N Cohen - International Conference on …, 2022 - proceedings.mlr.press
In the pursuit of explaining implicit regularization in deep learning, prominent focus was given to matrix and tensor factorizations, which correspond to simplified neural networks. It …
Y Luo, AR Zhang - arXiv preprint arXiv:2206.08756, 2022 - arxiv.org
We study the tensor-on-tensor regression, where the goal is to connect tensor responses to tensor covariates with a low Tucker rank parameter tensor/matrix without the prior …
N De La Vega, N Razin… - Advances in Neural …, 2024 - proceedings.neurips.cc
The question of what makes a data distribution suitable for deep learning is a fundamental open problem. Focusing on locally connected neural networks (a prevalent family of …
Z Ma, J Lavaei, S Sojoudi - Advances in Neural Information …, 2024 - proceedings.neurips.cc
Gradient descent (GD) is crucial for generalization in machine learning models, as it induces implicit regularization, promoting compact representations. In this work, we examine the role …
Monotonic linear interpolation (MLI)---on the line connecting a random initialization with the minimizer it converges to, the loss and accuracy are monotonic---is a phenomenon that is …
J Ma, L Guo, S Fattahi - arXiv preprint arXiv:2210.00346, 2022 - arxiv.org
This work analyzes the solution trajectory of gradient-based algorithms via a novel basis function decomposition. We show that, although solution trajectories of gradient-based …
Under a simplified data model, this paper provides a theoretical analysis of learning from data that have an underlying low-rank tensor structure in both supervised and unsupervised …