Rptq: Reorder-based post-training quantization for large language models

J Chee, Y Cai, V Kuleshov… - Advances in Neural …, 2024 - proceedings.neurips.cc

This work studies post-training parameter quantization in large language models (LLMs).
We introduce quantization with incoherence processing (QuIP), a new method based on the …

被引用次数：70 相关文章所有 6 个版本

[PDF] arxiv.org

Omniquant: Omnidirectionally calibrated quantization for large language models

W Shao, M Chen, Z Zhang, P Xu, L Zhao, Z Li… - arXiv preprint arXiv …, 2023 - arxiv.org

Large language models (LLMs) have revolutionized natural language processing tasks.
However, their practical deployment is hindered by their immense memory and computation …

被引用次数：69 相关文章所有 3 个版本

[PDF] arxiv.org

Squeezellm: Dense-and-sparse quantization

S Kim, C Hooper, A Gholami, Z Dong, X Li… - arXiv preprint arXiv …, 2023 - arxiv.org

Generative Large Language Models (LLMs) have demonstrated remarkable results for a
wide range of tasks. However, deploying these models for inference has been a significant …

被引用次数：85 相关文章所有 4 个版本

[PDF] arxiv.org

A survey on transformer compression

Y Tang, Y Wang, J Guo, Z Tu, K Han, H Hu… - arXiv preprint arXiv …, 2024 - arxiv.org

Large models based on the Transformer architecture play increasingly vital roles in artificial
intelligence, particularly within the realms of natural language processing (NLP) and …

被引用次数：10 相关文章所有 2 个版本

[PDF] arxiv.org

Towards efficient generative large language model serving: A survey from algorithms to systems

X Miao, G Oliaro, Z Zhang, X Cheng, H Jin… - arXiv preprint arXiv …, 2023 - arxiv.org

In the rapidly evolving landscape of artificial intelligence (AI), generative large language
models (LLMs) stand at the forefront, revolutionizing how we interact with our data. However …

被引用次数：36 相关文章所有 2 个版本

[PDF] researchgate.net

[PDF][PDF] The efficiency spectrum of large language models: An algorithmic survey

T Ding, T Chen, H Zhu, J Jiang, Y Zhong… - arXiv preprint arXiv …, 2023 - researchgate.net

The rapid growth of Large Language Models (LLMs) has been a driving force in
transforming various domains, reshaping the artificial general intelligence landscape …

被引用次数：10 相关文章所有 3 个版本

[PDF] researchgate.net

[PDF][PDF] Efficient large language models: A survey

Z Wan, X Wang, C Liu, S Alam, Y Zheng… - arXiv preprint arXiv …, 2023 - researchgate.net

Abstract Large Language Models (LLMs) have demonstrated remarkable capabilities in
important tasks such as natural language understanding, language generation, and …

被引用次数：53 相关文章所有 7 个版本

[PDF] arxiv.org

Personal llm agents: Insights and survey about the capability, efficiency and security

Y Li, H Wen, W Wang, X Li, Y Yuan, G Liu, J Liu… - arXiv preprint arXiv …, 2024 - arxiv.org

Since the advent of personal computing devices, intelligent personal assistants (IPAs) have
been one of the key technologies that researchers and engineers have focused on, aiming …

被引用次数：37 相关文章所有 3 个版本

[PDF] aaai.org

Norm tweaking: High-performance low-bit quantization of large language models

L Li, Q Li, B Zhang, X Chu - Proceedings of the AAAI Conference on …, 2024 - ojs.aaai.org

As the size of large language models (LLMs) continues to grow, model compression without
sacrificing accuracy has become a crucial challenge for deployment. In this paper, we …

被引用次数：19 相关文章所有 3 个版本

[PDF] arxiv.org

Qllm: Accurate and efficient low-bitwidth quantization for large language models

J Liu, R Gong, X Wei, Z Dong, J Cai… - arXiv preprint arXiv …, 2023 - arxiv.org

Large Language Models (LLMs) excel in NLP, but their demands hinder their widespread
deployment. While Quantization-Aware Training (QAT) offers a solution, its extensive …

被引用次数：26 相关文章所有 4 个版本

高级搜索

QQ 群