Towards understanding the role of over-parametrization in generalization of neural networks

X Hu, L Chu, J Pei, W Liu, J Bian - Knowledge and Information Systems, 2021 - Springer

Abstract Model complexity is a fundamental problem in deep learning. In this paper, we
conduct a systematic overview of the latest studies on model complexity in deep learning …

被引用次数：345 相关文章所有 6 个版本

[PDF] arxiv.org

Continual lifelong learning in natural language processing: A survey

M Biesialska, K Biesialska, MR Costa-Jussa - arXiv preprint arXiv …, 2020 - arxiv.org

Continual learning (CL) aims to enable information systems to learn from a continuous data
stream across time. However, it is difficult for existing deep learning architectures to learn a …

被引用次数：255 相关文章所有 4 个版本

[PDF] neurips.cc

Mind the gap: Understanding the modality gap in multi-modal contrastive representation learning

VW Liang, Y Zhang, Y Kwon… - Advances in Neural …, 2022 - proceedings.neurips.cc

We present modality gap, an intriguing geometric phenomenon of the representation space
of multi-modal models. Specifically, we show that different data modalities (eg images and …

被引用次数：370 相关文章所有 7 个版本

[PDF] jmlr.org

Sparsity in deep learning: Pruning and growth for efficient inference and training in neural networks

T Hoefler, D Alistarh, T Ben-Nun, N Dryden… - Journal of Machine …, 2021 - jmlr.org

The growing energy and performance costs of deep learning have driven the community to
reduce the size of neural networks by selectively pruning components. Similarly to their …

被引用次数：869 相关文章所有 27 个版本

[PDF] baai.ac.cn

A survey on vision transformer

K Han, Y Wang, H Chen, X Chen, J Guo… - IEEE transactions on …, 2022 - ieeexplore.ieee.org

Transformer, first applied to the field of natural language processing, is a type of deep neural
network mainly based on the self-attention mechanism. Thanks to its strong representation …

被引用次数：2643 相关文章所有 7 个版本

[PDF] mlr.press

Training graph neural networks with 1000 layers

G Li, M Müller, B Ghanem… - … conference on machine …, 2021 - proceedings.mlr.press

Deep graph neural networks (GNNs) have achieved excellent results on various tasks on
increasingly large graph datasets with millions of nodes and edges. However, memory …

被引用次数：284 相关文章所有 6 个版本

[PDF] arxiv.org

A survey on visual transformer

K Han, Y Wang, H Chen, X Chen, J Guo, Z Liu… - arXiv preprint arXiv …, 2020 - arxiv.org

Transformer, first applied to the field of natural language processing, is a type of deep neural
network mainly based on the self-attention mechanism. Thanks to its strong representation …

被引用次数：389 相关文章所有 3 个版本

[PDF] neurips.cc

What makes multi-modal learning better than single (provably)

Y Huang, C Du, Z Xue, X Chen… - Advances in Neural …, 2021 - proceedings.neurips.cc

The world provides us with data of multiple modalities. Intuitively, models fusing data from
different modalities outperform their uni-modal counterparts, since more information is …

被引用次数：289 相关文章所有 8 个版本

[PDF] arxiv.org

Theory of overparametrization in quantum neural networks

M Larocca, N Ju, D García-Martín, PJ Coles… - Nature Computational …, 2023 - nature.com

The prospect of achieving quantum advantage with quantum neural networks (QNNs) is
exciting. Understanding how QNN properties (for example, the number of parameters M) …

被引用次数：204 相关文章所有 11 个版本

[PDF] arxiv.org

Picking winning tickets before training by preserving gradient flow

C Wang, G Zhang, R Grosse - arXiv preprint arXiv:2002.07376, 2020 - arxiv.org

Overparameterization has been shown to benefit both the optimization and generalization of
neural networks, but large networks are resource hungry at both training and test time …

被引用次数：724 相关文章所有 4 个版本

高级搜索

QQ 群