A survey on model compression for large language models

X Zhu, J Li, Y Liu, C Ma, W Wang - Transactions of the Association for …, 2024 - direct.mit.edu
Abstract Large Language Models (LLMs) have transformed natural language processing
tasks successfully. Yet, their large size and high computational needs pose challenges for …

A survey of low-bit large language models: Basics, systems, and algorithms

R Gong, Y Ding, Z Wang, C Lv, X Zheng, J Du… - arXiv preprint arXiv …, 2024 - arxiv.org
Large language models (LLMs) have achieved remarkable advancements in natural
language processing, showcasing exceptional performance across various tasks. However …

A Review on Edge Large Language Models: Design, Execution, and Applications

Y Zheng, Y Chen, B Qian, X Shi, Y Shu… - arXiv preprint arXiv …, 2024 - arxiv.org
Large language models (LLMs) have revolutionized natural language processing with their
exceptional capabilities. However, deploying LLMs on resource-constrained edge devices …

Survey of different large language model architectures: Trends, benchmarks, and challenges

M Shao, A Basit, R Karri, M Shafique - IEEE Access, 2024 - ieeexplore.ieee.org
Large Language Models (LLMs) represent a class of deep learning models adept at
understanding natural language and generating coherent text in response to prompts or …

Vptq: Extreme low-bit vector post-training quantization for large language models

Y Liu, J Wen, Y Wang, S Ye, LL Zhang, T Cao… - arXiv preprint arXiv …, 2024 - arxiv.org
Scaling model size significantly challenges the deployment and inference of Large
Language Models (LLMs). Due to the redundancy in LLM weights, recent research has …

Efficient training and inference: Techniques for large language models using llama

SR Cunningham, D Archambault, A Kung - Authorea Preprints, 2024 - techrxiv.org
To enhance the efficiency of language models, it would involve optimizing their training and
inference processes to reduce computational demands while maintaining high performance …

Abq-llm: Arbitrary-bit quantized inference acceleration for large language models

C Zeng, S Liu, Y Xie, H Liu, X Wang, M Wei… - arXiv preprint arXiv …, 2024 - arxiv.org
Large Language Models (LLMs) have revolutionized natural language processing tasks.
However, their practical application is constrained by substantial memory and computational …

HotaQ: Hardware Oriented Token Adaptive Quantization for Large Language Models

X Shen, Z Han, L Lu, Z Kong, P Dong… - … on Computer-Aided …, 2024 - ieeexplore.ieee.org
The Large Language Models (LLMs) have been popular and widely used in creative ways
because of their powerful capabilities. However, the substantial model size and complexity …

QEFT: Quantization for Efficient Fine-Tuning of LLMs

C Lee, J Jin, Y Cho, E Park - arXiv preprint arXiv:2410.08661, 2024 - arxiv.org
With the rapid growth in the use of fine-tuning for large language models (LLMs), optimizing
fine-tuning while keeping inference efficient has become highly important. However, this is a …

Impact of ML optimization tactics on greener pre-trained ML models

AG Álvarez, J Castaño, X Franch… - arXiv preprint arXiv …, 2024 - arxiv.org
Background: Given the fast-paced nature of today's technology, which has surpassed
human performance in tasks like image classification, visual reasoning, and English …