And the bit goes down: Revisiting the quantization of neural networks

JO Neill - arXiv preprint arXiv:2006.03669, 2020 - arxiv.org

Overparameterized networks trained to convergence have shown impressive performance
in domains such as computer vision and natural language processing. Pushing state of the …

被引用次数：132 相关文章所有 3 个版本

[PDF] arxiv.org

A survey of quantization methods for efficient neural network inference

A Gholami, S Kim, Z Dong, Z Yao… - Low-Power Computer …, 2022 - taylorfrancis.com

This chapter provides approaches to the problem of quantizing the numerical values in deep
Neural Network computations, covering the advantages/disadvantages of current methods …

被引用次数：1213 相关文章所有 4 个版本

[PDF] thecvf.com

Levit: a vision transformer in convnet's clothing for faster inference

B Graham, A El-Nouby, H Touvron… - Proceedings of the …, 2021 - openaccess.thecvf.com

We design a family of image classification architectures that optimize the trade-off between
accuracy and efficiency in a high-speed regime. Our work exploits recent findings in …

被引用次数：640 相关文章所有 6 个版本

[PDF] arxiv.org

A white paper on neural network quantization

M Nagel, M Fournarakis, RA Amjad… - arXiv preprint arXiv …, 2021 - arxiv.org

While neural networks have advanced the frontiers in many applications, they often come at
a high computational cost. Reducing the power and latency of neural network inference is …

被引用次数：523 相关文章所有 2 个版本

[HTML] sciencedirect.com

[HTML][HTML] Pre-trained models: Past, present and future

X Han, Z Zhang, N Ding, Y Gu, X Liu, Y Huo, J Qiu… - AI Open, 2021 - Elsevier

Large-scale pre-trained models (PTMs) such as BERT and GPT have recently achieved
great success and become a milestone in the field of artificial intelligence (AI). Owing to …

被引用次数：814 相关文章所有 9 个版本

[PDF] mlr.press

Cocktailsgd: Fine-tuning foundation models over 500mbps networks

J Wang, Y Lu, B Yuan, B Chen… - International …, 2023 - proceedings.mlr.press

Distributed training of foundation models, especially large language models (LLMs), is
communication-intensive and so has heavily relied on centralized data centers with fast …

被引用次数：30 相关文章所有 4 个版本

[PDF] academia.edu

A comprehensive survey on model quantization for deep neural networks in image classification

B Rokh, A Azarpeyvand, A Khanteymoori - ACM Transactions on …, 2023 - dl.acm.org

Recent advancements in machine learning achieved by Deep Neural Networks (DNNs)
have been significant. While demonstrating high accuracy, DNNs are associated with a …

被引用次数：54 相关文章

[PDF] mlr.press

Up or down? adaptive rounding for post-training quantization

M Nagel, RA Amjad, M Van Baalen… - International …, 2020 - proceedings.mlr.press

When quantizing neural networks, assigning each floating-point weight to its nearest fixed-
point value is the predominant approach. We find that, perhaps surprisingly, this is not the …

被引用次数：521 相关文章所有 5 个版本

[PDF] acm.org

SPINN: synergistic progressive inference of neural networks over device and cloud

S Laskaridis, SI Venieris, M Almeida… - Proceedings of the 26th …, 2020 - dl.acm.org

Despite the soaring use of convolutional neural networks (CNNs) in mobile applications,
uniformly sustaining high-performance inference on mobile has been elusive due to the …

被引用次数：284 相关文章所有 5 个版本

[PDF] arxiv.org

Ternarybert: Distillation-aware ultra-low bit bert

W Zhang, L Hou, Y Yin, L Shang, X Chen… - arXiv preprint arXiv …, 2020 - arxiv.org

Transformer-based pre-training models like BERT have achieved remarkable performance
in many natural language processing tasks. However, these models are both computation …

被引用次数：203 相关文章所有 3 个版本

高级搜索

QQ 群