Bayesian bits: Unifying quantization and pruning

Y He, L Xiao - IEEE transactions on pattern analysis and …, 2023 - ieeexplore.ieee.org

The remarkable performance of deep Convolutional neural networks (CNNs) is generally
attributed to their deeper and wider architectures, which can come with significant …

被引用次数：147 相关文章所有 7 个版本

[PDF] arxiv.org

Computational complexity evaluation of neural network applications in signal processing

P Freire, S Srivallapanondh, A Napoli… - arXiv preprint arXiv …, 2022 - arxiv.org

In this paper, we provide a systematic approach for assessing and comparing the
computational complexity of neural network layers in digital signal processing. We provide …

被引用次数：56 相关文章所有 3 个版本

[PDF] arxiv.org

A survey of quantization methods for efficient neural network inference

A Gholami, S Kim, Z Dong, Z Yao… - Low-Power Computer …, 2022 - taylorfrancis.com

This chapter provides approaches to the problem of quantizing the numerical values in deep
Neural Network computations, covering the advantages/disadvantages of current methods …

被引用次数：1323 相关文章所有 4 个版本

[PDF] arxiv.org

A white paper on neural network quantization

M Nagel, M Fournarakis, RA Amjad… - arXiv preprint arXiv …, 2021 - arxiv.org

While neural networks have advanced the frontiers in many applications, they often come at
a high computational cost. Reducing the power and latency of neural network inference is …

被引用次数：585 相关文章所有 2 个版本

[PDF] jmlr.org

Sparsity in deep learning: Pruning and growth for efficient inference and training in neural networks

T Hoefler, D Alistarh, T Ben-Nun, N Dryden… - Journal of Machine …, 2021 - jmlr.org

The growing energy and performance costs of deep learning have driven the community to
reduce the size of neural networks by selectively pruning components. Similarly to their …

被引用次数：844 相关文章所有 27 个版本

[PDF] mlr.press

Hawq-v3: Dyadic neural network quantization

Z Yao, Z Dong, Z Zheng, A Gholami… - International …, 2021 - proceedings.mlr.press

Current low-precision quantization algorithms often have the hidden cost of conversion back
and forth from floating point to quantized integer values. This hidden cost limits the latency …

被引用次数：275 相关文章所有 8 个版本

[PDF] neurips.cc

Pruning vs quantization: Which is better?

A Kuzmin, M Nagel, M Van Baalen… - Advances in neural …, 2023 - proceedings.neurips.cc

Neural network pruning and quantization techniques are almost as old as neural networks
themselves. However, to date, only ad-hoc comparisons between the two have been …

被引用次数：43 相关文章所有 6 个版本

[PDF] arxiv.org

Coin: Compression with implicit neural representations

E Dupont, A Goliński, M Alizadeh, YW Teh… - arXiv preprint arXiv …, 2021 - arxiv.org

We propose a new simple approach for image compression: instead of storing the RGB
values for each pixel of an image, we store the weights of a neural network overfitted to the …

被引用次数：228 相关文章所有 4 个版本

[PDF] arxiv.org

Understanding and overcoming the challenges of efficient transformer quantization

Y Bondarenko, M Nagel, T Blankevoort - arXiv preprint arXiv:2109.12948, 2021 - arxiv.org

Transformer-based architectures have become the de-facto standard models for a wide
range of Natural Language Processing tasks. However, their memory footprint and high …

被引用次数：138 相关文章所有 4 个版本

[PDF] neurips.cc

Only train once: A one-shot neural network training and pruning framework

T Chen, B Ji, T Ding, B Fang, G Wang… - Advances in …, 2021 - proceedings.neurips.cc

Structured pruning is a commonly used technique in deploying deep neural networks
(DNNs) onto resource-constrained devices. However, the existing pruning methods are …

被引用次数：126 相关文章所有 9 个版本

高级搜索

QQ 群