Squeezellm: Dense-and-sparse quantization

X Zhu, J Li, Y Liu, C Ma, W Wang - Transactions of the Association for …, 2024 - direct.mit.edu

Abstract Large Language Models (LLMs) have transformed natural language processing
tasks successfully. Yet, their large size and high computational needs pose challenges for …

被引用次数：199 相关文章所有 2 个版本

[PDF] arxiv.org

Omniquant: Omnidirectionally calibrated quantization for large language models

W Shao, M Chen, Z Zhang, P Xu, L Zhao, Z Li… - arXiv preprint arXiv …, 2023 - arxiv.org

Large language models (LLMs) have revolutionized natural language processing tasks.
However, their practical deployment is hindered by their immense memory and computation …

被引用次数：146 相关文章所有 3 个版本

[PDF] arxiv.org

A survey on transformer compression

Y Tang, Y Wang, J Guo, Z Tu, K Han, H Hu… - arXiv preprint arXiv …, 2024 - arxiv.org

Large models based on the Transformer architecture play increasingly vital roles in artificial
intelligence, particularly within the realms of natural language processing (NLP) and …

被引用次数：21 相关文章所有 2 个版本

[PDF] arxiv.org

Medusa: Simple llm inference acceleration framework with multiple decoding heads

T Cai, Y Li, Z Geng, H Peng, JD Lee, D Chen… - arXiv preprint arXiv …, 2024 - arxiv.org

The inference process in Large Language Models (LLMs) is often limited due to the absence
of parallelism in the auto-regressive decoding process, resulting in most operations being …

被引用次数：131 相关文章所有 3 个版本

[PDF] arxiv.org

Towards efficient generative large language model serving: A survey from algorithms to systems

X Miao, G Oliaro, Z Zhang, X Cheng, H Jin… - arXiv preprint arXiv …, 2023 - arxiv.org

In the rapidly evolving landscape of artificial intelligence (AI), generative large language
models (LLMs) stand at the forefront, revolutionizing how we interact with our data. However …

被引用次数：60 相关文章所有 2 个版本

[PDF] researchgate.net

[PDF][PDF] The efficiency spectrum of large language models: An algorithmic survey

T Ding, T Chen, H Zhu, J Jiang, Y Zhong… - arXiv preprint arXiv …, 2023 - researchgate.net

The rapid growth of Large Language Models (LLMs) has been a driving force in
transforming various domains, reshaping the artificial general intelligence landscape …

被引用次数：15 相关文章所有 3 个版本

[PDF] arxiv.org

Beyond efficiency: A systematic survey of resource-efficient large language models

G Bai, Z Chai, C Ling, S Wang, J Lu, N Zhang… - arXiv preprint arXiv …, 2024 - arxiv.org

The burgeoning field of Large Language Models (LLMs), exemplified by sophisticated
models like OpenAI's ChatGPT, represents a significant advancement in artificial …

被引用次数：60 相关文章所有 2 个版本

[PDF] arxiv.org

Efficientqat: Efficient quantization-aware training for large language models

M Chen, W Shao, P Xu, J Wang, P Gao… - arXiv preprint arXiv …, 2024 - arxiv.org

Large language models (LLMs) are crucial in modern natural language processing and
artificial intelligence. However, they face challenges in managing their significant memory …

被引用次数：16 相关文章所有 3 个版本

[PDF] arxiv.org

A survey of resource-efficient llm and multimodal foundation models

M Xu, W Yin, D Cai, R Yi, D Xu, Q Wang, B Wu… - arXiv preprint arXiv …, 2024 - arxiv.org

Large foundation models, including large language models (LLMs), vision transformers
(ViTs), diffusion, and LLM-based multimodal models, are revolutionizing the entire machine …

被引用次数：70 相关文章所有 3 个版本

[PDF] arxiv.org

Extreme compression of large language models via additive quantization

V Egiazarian, A Panferov, D Kuznedelev… - arXiv preprint arXiv …, 2024 - arxiv.org

The emergence of accurate open large language models (LLMs) has led to a race towards
quantization techniques for such models enabling execution on end-user devices. In this …

被引用次数：51 相关文章所有 3 个版本

高级搜索

QQ 群