M Zou, R Guo, S Zhang, X Zhang, Z Feng - arXiv preprint arXiv …, 2024 - arxiv.org
As the size and context length of Large Language Models (LLMs) grow, weight-activation quantization has emerged as a crucial technique for efficient deployment of LLMs …
Large Language Models (LLMs) have recently demonstrated a remarkable success across various tasks. However, efficiently serving LLMs has been a challenge due to its large …
W Cheng, W Zhang, H Shen, Y Cai, X He, K Lv… - arXiv preprint arXiv …, 2023 - arxiv.org
Large Language Models (LLMs) have proven their exceptional capabilities in performing language-related tasks. However, their deployment poses significant challenges due to their …
Y Liu, Y Meng, F Wu, S Peng, H Yao, C Guan… - arXiv preprint arXiv …, 2024 - arxiv.org
Large language models (LLMs) have exhibited exciting progress in multiple scenarios, while the huge computational demands hinder their deployments in lots of real-world applications …
Large-scale language models (LLMs) have demonstrated impressive performance, but their deployment presents challenges due to their significant memory usage. This issue can be …
Large Language Models (LLMs) excel in NLP, but their demands hinder their widespread deployment. While Quantization-Aware Training (QAT) offers a solution, its extensive …
W Cui, Q Wang - arXiv preprint arXiv:2404.02837, 2024 - arxiv.org
This paper reveals the phenomenon of parameter heterogeneity in large language models (LLMs). We find that a small subset of``cherry''parameters exhibit a disproportionately large …
Transformer-based large language models (LLMs) have achieved great success with the growing model size. LLMs' size grows by 240× every two years, which outpaces the …
Large language models (LLMs) excel in natural language processing but demand intensive computation. To mitigate this, various quantization methods have been explored, yet they …