Robust training under label noise by over-parameterization

S Liu, Z Zhu, Q Qu, C You - International Conference on …, 2022 - proceedings.mlr.press
Recently, over-parameterized deep networks, with increasingly more network parameters
than training samples, have dominated the performances of modern machine learning …

The lazy neuron phenomenon: On emergence of activation sparsity in transformers

Z Li, C You, S Bhojanapalli, D Li, AS Rawat… - arXiv preprint arXiv …, 2022 - arxiv.org
This paper studies the curious phenomenon for machine learning models with Transformer
architectures that their activation maps are sparse. By activation map we refer to the …

Implicit balancing and regularization: Generalization and convergence guarantees for overparameterized asymmetric matrix sensing

M Soltanolkotabi, D Stöger… - The Thirty Sixth Annual …, 2023 - proceedings.mlr.press
Recently, there has been significant progress in understanding the convergence and
generalization properties of gradient-based methods for training overparameterized learning …

Implicit regularization in hierarchical tensor factorization and deep convolutional neural networks

N Razin, A Maman, N Cohen - International Conference on …, 2022 - proceedings.mlr.press
In the pursuit of explaining implicit regularization in deep learning, prominent focus was
given to matrix and tensor factorizations, which correspond to simplified neural networks. It …

Incremental learning in diagonal linear networks

R Berthier - Journal of Machine Learning Research, 2023 - jmlr.org
Diagonal linear networks (DLNs) are a toy simpli_cation of artificial neural networks; they
consist in a quadratic reparametrization of linear regression inducing a sparse implicit …

[PDF][PDF] Smoothing the edges: a general framework for smooth optimization in sparse regularization using Hadamard overparametrization

C Kolb, CL Müller, B Bischl… - arXiv preprint arXiv …, 2023 - researchgate.net
This paper presents a framework for smooth optimization of objectives with ℓq and ℓp, q
regularization for (structured) sparsity. Finding solutions to these non-smooth and possibly …

Implicit regularization in AI meets generalized hardness of approximation in optimization--Sharp results for diagonal linear networks

JS Wind, V Antun, AC Hansen - arXiv preprint arXiv:2307.07410, 2023 - arxiv.org
Understanding the implicit regularization imposed by neural network architectures and
gradient based optimization methods is a key challenge in deep learning and AI. In this work …

Blessing of depth in linear regression: Deeper models have flatter landscape around the true solution

J Ma, S Fattahi - Advances in Neural Information Processing …, 2022 - proceedings.neurips.cc
This work characterizes the effect of depth on the optimization landscape of linear
regression, showing that, despite their nonconvexity, deeper models have more desirable …

From NeurODEs to AutoencODEs: a mean-field control framework for width-varying neural networks

C Cipriani, M Fornasier, A Scagliotti - European Journal of Applied …, 2024 - cambridge.org
The connection between Residual Neural Networks (ResNets) and continuous-time control
systems (known as NeurODEs) has led to a mathematical analysis of neural networks, which …

Implicit regularization for group sparsity

J Li, TV Nguyen, C Hegde, RKW Wong - arXiv preprint arXiv:2301.12540, 2023 - arxiv.org
We study the implicit regularization of gradient descent towards structured sparsity via a
novel neural reparameterization, which we call a diagonally grouped linear neural network …