Rotation and Permutation for Advanced Outlier Management and Efficient Quantization of LLMs

H Lin, H Xu, Y Wu, J Cui, Y Zhang, L Mou… - arXiv preprint arXiv …, 2024 - arxiv.org
Quantizing large language models (LLMs) presents significant challenges, primarily due to
outlier activations that compromise the efficiency of low-bit representation. Traditional …

BiSup: Bidirectional Quantization Error Suppression for Large Language Models

M Zou, R Guo, S Zhang, X Zhang, Z Feng - arXiv preprint arXiv …, 2024 - arxiv.org
As the size and context length of Large Language Models (LLMs) grow, weight-activation
quantization has emerged as a crucial technique for efficient deployment of LLMs …

Rethinking channel dimensions to isolate outliers for low-bit weight quantization of large language models

JH Heo, J Kim, B Kwon, B Kim, SJ Kwon… - arXiv preprint arXiv …, 2023 - arxiv.org
Large Language Models (LLMs) have recently demonstrated a remarkable success across
various tasks. However, efficiently serving LLMs has been a challenge due to its large …

Optimize weight rounding via signed gradient descent for the quantization of llms

W Cheng, W Zhang, H Shen, Y Cai, X He, K Lv… - arXiv preprint arXiv …, 2023 - arxiv.org
Large Language Models (LLMs) have proven their exceptional capabilities in performing
language-related tasks. However, their deployment poses significant challenges due to their …

Evaluating the Generalization Ability of Quantized LLMs: Benchmark, Analysis, and Toolbox

Y Liu, Y Meng, F Wu, S Peng, H Yao, C Guan… - arXiv preprint arXiv …, 2024 - arxiv.org
Large language models (LLMs) have exhibited exciting progress in multiple scenarios, while
the huge computational demands hinder their deployments in lots of real-world applications …

Rptq: Reorder-based post-training quantization for large language models

Z Yuan, L Niu, J Liu, W Liu, X Wang, Y Shang… - arXiv preprint arXiv …, 2023 - arxiv.org
Large-scale language models (LLMs) have demonstrated impressive performance, but their
deployment presents challenges due to their significant memory usage. This issue can be …

Qllm: Accurate and efficient low-bitwidth quantization for large language models

J Liu, R Gong, X Wei, Z Dong, J Cai… - arXiv preprint arXiv …, 2023 - arxiv.org
Large Language Models (LLMs) excel in NLP, but their demands hinder their widespread
deployment. While Quantization-Aware Training (QAT) offers a solution, its extensive …

Cherry on Top: Parameter Heterogeneity and Quantization in Large Language Models

W Cui, Q Wang - arXiv preprint arXiv:2404.02837, 2024 - arxiv.org
This paper reveals the phenomenon of parameter heterogeneity in large language models
(LLMs). We find that a small subset of``cherry''parameters exhibit a disproportionately large …

Olive: Accelerating large language models via hardware-friendly outlier-victim pair quantization

C Guo, J Tang, W Hu, J Leng, C Zhang… - Proceedings of the 50th …, 2023 - dl.acm.org
Transformer-based large language models (LLMs) have achieved great success with the
growing model size. LLMs' size grows by 240× every two years, which outpaces the …

IntactKV: Improving Large Language Model Quantization by Keeping Pivot Tokens Intact

R Liu, H Bai, H Lin, Y Li, H Gao, Z Xu, L Hou… - arXiv preprint arXiv …, 2024 - arxiv.org
Large language models (LLMs) excel in natural language processing but demand intensive
computation. To mitigate this, various quantization methods have been explored, yet they …