More is less: inducing sparsity via overparameterization

S Liu, Z Zhu, Q Qu, C You - International Conference on …, 2022 - proceedings.mlr.press

Recently, over-parameterized deep networks, with increasingly more network parameters
than training samples, have dominated the performances of modern machine learning …

被引用次数：101 相关文章所有 6 个版本

[PDF] arxiv.org

The lazy neuron phenomenon: On emergence of activation sparsity in transformers

Z Li, C You, S Bhojanapalli, D Li, AS Rawat… - arXiv preprint arXiv …, 2022 - arxiv.org

This paper studies the curious phenomenon for machine learning models with Transformer
architectures that their activation maps are sparse. By activation map we refer to the …

被引用次数：50 相关文章所有 4 个版本

[PDF] mlr.press

Implicit balancing and regularization: Generalization and convergence guarantees for overparameterized asymmetric matrix sensing

M Soltanolkotabi, D Stöger… - The Thirty Sixth Annual …, 2023 - proceedings.mlr.press

Recently, there has been significant progress in understanding the convergence and
generalization properties of gradient-based methods for training overparameterized learning …

被引用次数：17 相关文章所有 5 个版本

[PDF] mlr.press

Implicit regularization in hierarchical tensor factorization and deep convolutional neural networks

N Razin, A Maman, N Cohen - International Conference on …, 2022 - proceedings.mlr.press

In the pursuit of explaining implicit regularization in deep learning, prominent focus was
given to matrix and tensor factorizations, which correspond to simplified neural networks. It …

被引用次数：28 相关文章所有 4 个版本

[PDF] jmlr.org

Incremental learning in diagonal linear networks

R Berthier - Journal of Machine Learning Research, 2023 - jmlr.org

Diagonal linear networks (DLNs) are a toy simpli_cation of artificial neural networks; they
consist in a quadratic reparametrization of linear regression inducing a sparse implicit …

被引用次数：16 相关文章所有 6 个版本

[PDF] researchgate.net

[PDF][PDF] Smoothing the edges: a general framework for smooth optimization in sparse regularization using Hadamard overparametrization

C Kolb, CL Müller, B Bischl… - arXiv preprint arXiv …, 2023 - researchgate.net

This paper presents a framework for smooth optimization of objectives with ℓq and ℓp, q
regularization for (structured) sparsity. Finding solutions to these non-smooth and possibly …

被引用次数：7 相关文章所有 2 个版本

[PDF] arxiv.org

Implicit regularization in AI meets generalized hardness of approximation in optimization--Sharp results for diagonal linear networks

JS Wind, V Antun, AC Hansen - arXiv preprint arXiv:2307.07410, 2023 - arxiv.org

Understanding the implicit regularization imposed by neural network architectures and
gradient based optimization methods is a key challenge in deep learning and AI. In this work …

被引用次数：5 相关文章所有 2 个版本

[PDF] neurips.cc

Blessing of depth in linear regression: Deeper models have flatter landscape around the true solution

J Ma, S Fattahi - Advances in Neural Information Processing …, 2022 - proceedings.neurips.cc

This work characterizes the effect of depth on the optimization landscape of linear
regression, showing that, despite their nonconvexity, deeper models have more desirable …

被引用次数：6 相关文章所有 5 个版本

[PDF] cambridge.org

From NeurODEs to AutoencODEs: a mean-field control framework for width-varying neural networks

C Cipriani, M Fornasier, A Scagliotti - European Journal of Applied …, 2024 - cambridge.org

The connection between Residual Neural Networks (ResNets) and continuous-time control
systems (known as NeurODEs) has led to a mathematical analysis of neural networks, which …

被引用次数：3 相关文章所有 6 个版本

[PDF] arxiv.org

Implicit regularization for group sparsity

J Li, TV Nguyen, C Hegde, RKW Wong - arXiv preprint arXiv:2301.12540, 2023 - arxiv.org

We study the implicit regularization of gradient descent towards structured sparsity via a
novel neural reparameterization, which we call a diagonally grouped linear neural network …

被引用次数：5 相关文章所有 4 个版本

高级搜索

QQ 群