Alternating multi-bit quantization for recurrent neural networks

L Deng, G Li, S Han, L Shi, Y Xie - Proceedings of the IEEE, 2020 - ieeexplore.ieee.org

Domain-specific hardware is becoming a promising topic in the backdrop of improvement
slow down for general-purpose processors due to the foreseeable end of Moore's Law …

被引用次数：786 相关文章所有 2 个版本

A comprehensive survey on model compression and acceleration

T Choudhary, V Mishra, A Goswami… - Artificial Intelligence …, 2020 - Springer

In recent years, machine learning (ML) and deep learning (DL) have shown remarkable
improvement in computer vision, natural language processing, stock prediction, forecasting …

被引用次数：392 相关文章所有 8 个版本

[PDF] arxiv.org

A survey of quantization methods for efficient neural network inference

A Gholami, S Kim, Z Dong, Z Yao… - Low-Power Computer …, 2022 - taylorfrancis.com

This chapter provides approaches to the problem of quantizing the numerical values in deep
Neural Network computations, covering the advantages/disadvantages of current methods …

被引用次数：965 相关文章所有 4 个版本

[PDF] aaai.org

Q-bert: Hessian based ultra low precision quantization of bert

S Shen, Z Dong, J Ye, L Ma, Z Yao, A Gholami… - Proceedings of the AAAI …, 2020 - aaai.org

Transformer based architectures have become de-facto models used for a range of Natural
Language Processing tasks. In particular, the BERT based models achieved significant …

被引用次数：524 相关文章所有 13 个版本

[PDF] ieee.org

Efficient acceleration of deep learning inference on resource-constrained edge devices: A review

MMH Shuvo, SK Islam, J Cheng… - Proceedings of the …, 2022 - ieeexplore.ieee.org

Successful integration of deep neural networks (DNNs) or deep learning (DL) has resulted
in breakthroughs in many areas. However, deploying these highly accurate models for data …

被引用次数：60 相关文章所有 5 个版本

[PDF] thecvf.com

Espnetv2: A light-weight, power efficient, and general purpose convolutional neural network

S Mehta, M Rastegari, L Shapiro… - Proceedings of the …, 2019 - openaccess.thecvf.com

We introduce a light-weight, power efficient, and general purpose convolutional neural
network, ESPNetv2, for modeling visual and sequential data. Our network uses group point …

被引用次数：517 相关文章所有 11 个版本

[PDF] arxiv.org

Lut-gemm: Quantized matrix multiplication based on luts for efficient inference in large-scale generative language models

G Park, B Park, M Kim, S Lee, J Kim, B Kwon… - arXiv preprint arXiv …, 2022 - arxiv.org

The recent advancements in self-supervised learning, combined with the Transformer
architecture, have enabled natural language processing (NLP) to achieve remarkably low …

被引用次数：78 相关文章所有 3 个版本

[PDF] arxiv.org

Understanding and overcoming the challenges of efficient transformer quantization

Y Bondarenko, M Nagel, T Blankevoort - arXiv preprint arXiv:2109.12948, 2021 - arxiv.org

Transformer-based architectures have become the de-facto standard models for a wide
range of Natural Language Processing tasks. However, their memory footprint and high …

被引用次数：102 相关文章所有 4 个版本

[PDF] arxiv.org

A survey on methods and theories of quantized neural networks

Y Guo - arXiv preprint arXiv:1808.04752, 2018 - arxiv.org

Deep neural networks are the state-of-the-art methods for many real-world tasks, such as
computer vision, natural language processing and speech recognition. For all its popularity …

被引用次数：295 相关文章所有 2 个版本

[PDF] arxiv.org

Compression of deep learning models for text: A survey

M Gupta, P Agrawal - ACM Transactions on Knowledge Discovery from …, 2022 - dl.acm.org

In recent years, the fields of natural language processing (NLP) and information retrieval (IR)
have made tremendous progress thanks to deep learning models like Recurrent Neural …

被引用次数：99 相关文章所有 5 个版本

高级搜索

QQ 群