Understanding deflation process in over-parametrized tensor decomposition

K Lyu, Z Li, S Arora - Advances in Neural Information …, 2022 - proceedings.neurips.cc

Abstract Normalization layers (eg, Batch Normalization, Layer Normalization) were
introduced to help with optimization difficulties in very deep nets, but they clearly also help …

被引用次数：68 相关文章所有 8 个版本

[PDF] neurips.cc

Implicit bias of gradient descent on reparametrized models: On equivalence to mirror descent

Z Li, T Wang, JD Lee, S Arora - Advances in Neural …, 2022 - proceedings.neurips.cc

As part of the effort to understand implicit bias of gradient descent in overparametrized
models, several results have shown how the training trajectory on the overparametrized …

被引用次数：26 相关文章所有 11 个版本

[PDF] jmlr.org

Incremental learning in diagonal linear networks

R Berthier - Journal of Machine Learning Research, 2023 - jmlr.org

Diagonal linear networks (DLNs) are a toy simpli_cation of artificial neural networks; they
consist in a quadratic reparametrization of linear regression inducing a sparse implicit …

被引用次数：17 相关文章所有 6 个版本

[PDF] mlr.press

Implicit regularization in hierarchical tensor factorization and deep convolutional neural networks

N Razin, A Maman, N Cohen - International Conference on …, 2022 - proceedings.mlr.press

In the pursuit of explaining implicit regularization in deep learning, prominent focus was
given to matrix and tensor factorizations, which correspond to simplified neural networks. It …

被引用次数：30 相关文章所有 4 个版本

[PDF] arxiv.org

Tensor-on-tensor regression: Riemannian optimization, over-parameterization, statistical-computational gap, and their interplay

Y Luo, AR Zhang - arXiv preprint arXiv:2206.08756, 2022 - arxiv.org

We study the tensor-on-tensor regression, where the goal is to connect tensor responses to
tensor covariates with a low Tucker rank parameter tensor/matrix without the prior …

被引用次数：17 相关文章所有 2 个版本

[PDF] neurips.cc

What Makes Data Suitable for a Locally Connected Neural Network? A Necessary and Sufficient Condition Based on Quantum Entanglement.

N De La Vega, N Razin… - Advances in Neural …, 2024 - proceedings.neurips.cc

The question of what makes a data distribution suitable for deep learning is a fundamental
open problem. Focusing on locally connected neural networks (a prevalent family of …

[PDF] neurips.cc

Algorithmic regularization in tensor optimization: towards a lifted approach in matrix sensing

Z Ma, J Lavaei, S Sojoudi - Advances in Neural Information …, 2024 - proceedings.neurips.cc

Gradient descent (GD) is crucial for generalization in machine learning models, as it induces
implicit regularization, promoting compact representations. In this work, we examine the role …

被引用次数：3 相关文章所有 8 个版本

[PDF] openreview.net

Plateau in Monotonic Linear Interpolation---A" Biased" View of Loss Landscape for Deep Networks

X Wang, AN Wang, M Zhou, R Ge - The Eleventh International …, 2022 - openreview.net

Monotonic linear interpolation (MLI)---on the line connecting a random initialization with the
minimizer it converges to, the loss and accuracy are monotonic---is a phenomenon that is …

被引用次数：7 相关文章所有 3 个版本

[PDF] arxiv.org

Behind the scenes of gradient descent: A trajectory analysis via basis function decomposition

J Ma, L Guo, S Fattahi - arXiv preprint arXiv:2210.00346, 2022 - arxiv.org

This work analyzes the solution trajectory of gradient-based algorithms via a novel basis
function decomposition. We show that, although solution trajectories of gradient-based …

被引用次数：8 相关文章所有 4 个版本

[PDF] mlr.press

Learning from low rank tensor data: A random tensor theory perspective

MEA Seddik, M Tiomoko… - Uncertainty in …, 2023 - proceedings.mlr.press

Under a simplified data model, this paper provides a theoretical analysis of learning from
data that have an underlying low-rank tensor structure in both supervised and unsupervised …

被引用次数：1 相关文章所有 14 个版本

高级搜索

QQ 群