Model compression and acceleration for deep neural networks: The principles, progress, and challenges

Y Cheng, D Wang, P Zhou… - IEEE Signal Processing …, 2018 - ieeexplore.ieee.org
In recent years, deep neural networks (DNNs) have received increased attention, have been
applied to different applications, and achieved dramatic accuracy improvements in many …

Deep neural network approximation for custom hardware: Where we've been, where we're going

E Wang, JJ Davis, R Zhao, HC Ng, X Niu… - ACM Computing …, 2019 - dl.acm.org
Deep neural networks have proven to be particularly effective in visual and audio
recognition tasks. Existing models tend to be computationally expensive and memory …

Dynamic neural networks: A survey

Y Han, G Huang, S Song, L Yang… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org
Dynamic neural network is an emerging research topic in deep learning. Compared to static
models which have fixed computational graphs and parameters at the inference stage …

Sparsity in deep learning: Pruning and growth for efficient inference and training in neural networks

T Hoefler, D Alistarh, T Ben-Nun, N Dryden… - Journal of Machine …, 2021 - jmlr.org
The growing energy and performance costs of deep learning have driven the community to
reduce the size of neural networks by selectively pruning components. Similarly to their …

A survey of model compression and acceleration for deep neural networks

Y Cheng, D Wang, P Zhou, T Zhang - arXiv preprint arXiv:1710.09282, 2017 - arxiv.org
Deep neural networks (DNNs) have recently achieved great success in many visual
recognition tasks. However, existing deep neural network models are computationally …

Outrageously large neural networks: The sparsely-gated mixture-of-experts layer

N Shazeer, A Mirhoseini, K Maziarz, A Davis… - arXiv preprint arXiv …, 2017 - arxiv.org
The capacity of a neural network to absorb information is limited by its number of
parameters. Conditional computation, where parts of the network are active on a per …

Runtime neural pruning

J Lin, Y Rao, J Lu, J Zhou - Advances in neural information …, 2017 - proceedings.neurips.cc
In this paper, we propose a Runtime Neural Pruning (RNP) framework which prunes the
deep neural network dynamically at the runtime. Unlike existing neural pruning methods …

Dynamic channel pruning: Feature boosting and suppression

X Gao, Y Zhao, Ł Dudziak, R Mullins, C Xu - arXiv preprint arXiv …, 2018 - arxiv.org
Making deep convolutional neural networks more accurate typically comes at the cost of
increased computational and memory resources. In this paper, we reduce this cost by …

Fully-adaptive feature sharing in multi-task networks with applications in person attribute classification

Y Lu, A Kumar, S Zhai, Y Cheng… - Proceedings of the …, 2017 - openaccess.thecvf.com
Multi-task learning aims to improve generalization performance of multiple prediction tasks
by appropriately sharing relevant information across them. In the context of deep neural …

Spatially adaptive computation time for residual networks

M Figurnov, MD Collins, Y Zhu… - Proceedings of the …, 2017 - openaccess.thecvf.com
This paper proposes a deep learning architecture based on Residual Network that
dynamically adjusts the number of executed layers for the regions of the image. This …