Nonuniform-to-uniform quantization: Towards accurate quantization via generalized straight-throug...

Z Liu, B Oguz, C Zhao, E Chang, P Stock… - arXiv preprint arXiv …, 2023 - arxiv.org

Several post-training quantization methods have been applied to large language models
(LLMs), and have been shown to perform well down to 8-bits. We find that these methods …

被引用次数：131 相关文章所有 3 个版本

[PDF] arxiv.org

Omniquant: Omnidirectionally calibrated quantization for large language models

W Shao, M Chen, Z Zhang, P Xu, L Zhao, Z Li… - arXiv preprint arXiv …, 2023 - arxiv.org

Large language models (LLMs) have revolutionized natural language processing tasks.
However, their practical deployment is hindered by their immense memory and computation …

被引用次数：82 相关文章所有 3 个版本

A comprehensive survey on model quantization for deep neural networks in image classification

B Rokh, A Azarpeyvand, A Khanteymoori - ACM Transactions on …, 2023 - dl.acm.org

Recent advancements in machine learning achieved by Deep Neural Networks (DNNs)
have been significant. While demonstrating high accuracy, DNNs are associated with a …

被引用次数：36 相关文章

[PDF] arxiv.org

Qllm: Accurate and efficient low-bitwidth quantization for large language models

J Liu, R Gong, X Wei, Z Dong, J Cai… - arXiv preprint arXiv …, 2023 - arxiv.org

Large Language Models (LLMs) excel in NLP, but their demands hinder their widespread
deployment. While Quantization-Aware Training (QAT) offers a solution, its extensive …

被引用次数：27 相关文章所有 4 个版本

[PDF] neurips.cc

Bit: Robustly binarized multi-distilled transformer

Z Liu, B Oguz, A Pappu, L Xiao, S Yih… - Advances in neural …, 2022 - proceedings.neurips.cc

Modern pre-trained transformers have rapidly advanced the state-of-the-art in machine
learning, but have also grown in parameters and computational complexity, making them …

被引用次数：43 相关文章所有 6 个版本

[PDF] thecvf.com

Q-detr: An efficient low-bit quantized detection transformer

S Xu, Y Li, M Lin, P Gao, G Guo… - Proceedings of the …, 2023 - openaccess.thecvf.com

The recent detection transformer (DETR) has advanced object detection, but its application
on resource-constrained devices requires massive computation and memory resources …

被引用次数：15 相关文章所有 7 个版本

[PDF] arxiv.org

A survey on deep learning hardware accelerators for heterogeneous hpc platforms

C Silvano, D Ielmini, F Ferrandi, L Fiorin… - arXiv preprint arXiv …, 2023 - arxiv.org

Recent trends in deep learning (DL) imposed hardware accelerators as the most viable
solution for several classes of high-performance computing (HPC) applications such as …

被引用次数：22 相关文章所有 2 个版本

[PDF] arxiv.org

A comprehensive survey of compression algorithms for language models

S Park, J Choi, S Lee, U Kang - arXiv preprint arXiv:2401.15347, 2024 - arxiv.org

How can we compress language models without sacrificing accuracy? The number of
compression algorithms for language models is rapidly growing to benefit from remarkable …

被引用次数：5 相关文章所有 2 个版本

[PDF] mlr.press

Oscillation-free quantization for low-bit vision transformers

SY Liu, Z Liu, KT Cheng - International Conference on …, 2023 - proceedings.mlr.press

Weight oscillation is a by-product of quantization-aware training, in which quantized weights
frequently jump between two quantized levels, resulting in training instability and a sub …

被引用次数：16 相关文章所有 6 个版本

[PDF] arxiv.org

Irgen: Generative modeling for image retrieval

Y Zhang, T Zhang, D Chen, Y Wang, Q Chen… - arXiv preprint arXiv …, 2023 - arxiv.org

While generative modeling has been ubiquitous in natural language processing and
computer vision, its application to image retrieval remains unexplored. In this paper, we …

被引用次数：11 相关文章所有 3 个版本

高级搜索

QQ 群