Deep neural collapse is provably optimal for the deep unconstrained features model

P Súkeník, M Mondelli… - Advances in Neural …, 2024 - proceedings.neurips.cc
Neural collapse (NC) refers to the surprising structure of the last layer of deep neural
networks in the terminal phase of gradient descent training. Recently, an increasing amount …

Principled and efficient transfer learning of deep models via neural collapse

X Li, S Liu, J Zhou, X Lu, C Fernandez-Granda… - arXiv preprint arXiv …, 2022 - arxiv.org
As model size continues to grow and access to labeled training data remains limited,
transfer learning has become a popular approach in many scientific and engineering fields …

Understanding deep representation learning via layerwise feature compression and discrimination

P Wang, X Li, C Yaras, Z Zhu, L Balzano, W Hu… - arXiv preprint arXiv …, 2023 - arxiv.org
Over the past decade, deep learning has proven to be a highly effective tool for learning
meaningful features from raw data. However, it remains an open question how deep …

Neural collapse in multi-label learning with pick-all-label loss

P Li, X Li, Y Wang, Q Qu - arXiv preprint arXiv:2310.15903, 2023 - arxiv.org
We study deep neural networks for the multi-label classification (MLab) task through the lens
of neural collapse (NC). Previous works have been restricted to the multi-class classification …

Average gradient outer product as a mechanism for deep neural collapse

D Beaglehole, P Súkeník, M Mondelli… - arXiv preprint arXiv …, 2024 - arxiv.org
Deep Neural Collapse (DNC) refers to the surprisingly rigid structure of the data
representations in the final layers of Deep Neural Networks (DNNs). Though the …

The law of parsimony in gradient descent for learning deep linear networks

C Yaras, P Wang, W Hu, Z Zhu, L Balzano… - arXiv preprint arXiv …, 2023 - arxiv.org
Over the past few years, an extensively studied phenomenon in training deep networks is
the implicit bias of gradient descent towards parsimonious solutions. In this work, we …

Linguistic Collapse: Neural Collapse in (Large) Language Models

R Wu, V Papyan - arXiv preprint arXiv:2405.17767, 2024 - arxiv.org
Neural collapse ($\mathcal {NC} $) is a phenomenon observed in classification tasks where
top-layer representations collapse into their class means, which become equinorm …

Residual alignment: uncovering the mechanisms of residual networks

J Li, V Papyan - Advances in Neural Information Processing …, 2024 - proceedings.neurips.cc
The ResNet architecture has been widely adopted in deep learning due to its significant
boost to performance through the use of simple skip connections, yet the underlying …

A Global Geometric Analysis of Maximal Coding Rate Reduction

P Wang, H Liu, D Pai, Y Yu, Z Zhu, Q Qu… - arXiv preprint arXiv …, 2024 - arxiv.org
The maximal coding rate reduction (MCR $^ 2$) objective for learning structured and
compact deep representations is drawing increasing attention, especially after its recent …

Wide neural networks trained with weight decay provably exhibit neural collapse

A Jacot, P Súkeník, Z Wang, M Mondelli - arXiv preprint arXiv:2410.04887, 2024 - arxiv.org
Deep neural networks (DNNs) at convergence consistently represent the training data in the
last layer via a highly symmetric geometric structure referred to as neural collapse. This …