Model complexity of deep learning: A survey

X Hu, L Chu, J Pei, W Liu, J Bian - Knowledge and Information Systems, 2021 - Springer
Abstract Model complexity is a fundamental problem in deep learning. In this paper, we
conduct a systematic overview of the latest studies on model complexity in deep learning …

Continual lifelong learning in natural language processing: A survey

M Biesialska, K Biesialska, MR Costa-Jussa - arXiv preprint arXiv …, 2020 - arxiv.org
Continual learning (CL) aims to enable information systems to learn from a continuous data
stream across time. However, it is difficult for existing deep learning architectures to learn a …

Mind the gap: Understanding the modality gap in multi-modal contrastive representation learning

VW Liang, Y Zhang, Y Kwon… - Advances in Neural …, 2022 - proceedings.neurips.cc
We present modality gap, an intriguing geometric phenomenon of the representation space
of multi-modal models. Specifically, we show that different data modalities (eg images and …

Sparsity in deep learning: Pruning and growth for efficient inference and training in neural networks

T Hoefler, D Alistarh, T Ben-Nun, N Dryden… - Journal of Machine …, 2021 - jmlr.org
The growing energy and performance costs of deep learning have driven the community to
reduce the size of neural networks by selectively pruning components. Similarly to their …

A survey on vision transformer

K Han, Y Wang, H Chen, X Chen, J Guo… - IEEE transactions on …, 2022 - ieeexplore.ieee.org
Transformer, first applied to the field of natural language processing, is a type of deep neural
network mainly based on the self-attention mechanism. Thanks to its strong representation …

Training graph neural networks with 1000 layers

G Li, M Müller, B Ghanem… - … conference on machine …, 2021 - proceedings.mlr.press
Deep graph neural networks (GNNs) have achieved excellent results on various tasks on
increasingly large graph datasets with millions of nodes and edges. However, memory …

A survey on visual transformer

K Han, Y Wang, H Chen, X Chen, J Guo, Z Liu… - arXiv preprint arXiv …, 2020 - arxiv.org
Transformer, first applied to the field of natural language processing, is a type of deep neural
network mainly based on the self-attention mechanism. Thanks to its strong representation …

What makes multi-modal learning better than single (provably)

Y Huang, C Du, Z Xue, X Chen… - Advances in Neural …, 2021 - proceedings.neurips.cc
The world provides us with data of multiple modalities. Intuitively, models fusing data from
different modalities outperform their uni-modal counterparts, since more information is …

Theory of overparametrization in quantum neural networks

M Larocca, N Ju, D García-Martín, PJ Coles… - Nature Computational …, 2023 - nature.com
The prospect of achieving quantum advantage with quantum neural networks (QNNs) is
exciting. Understanding how QNN properties (for example, the number of parameters M) …

Picking winning tickets before training by preserving gradient flow

C Wang, G Zhang, R Grosse - arXiv preprint arXiv:2002.07376, 2020 - arxiv.org
Overparameterization has been shown to benefit both the optimization and generalization of
neural networks, but large networks are resource hungry at both training and test time …