A comprehensive overview of large language models

H Naveed, AU Khan, S Qiu, M Saqib, S Anwar… - arXiv preprint arXiv …, 2023 - arxiv.org
Large Language Models (LLMs) have recently demonstrated remarkable capabilities in
natural language processing tasks and beyond. This success of LLMs has led to a large …

A survey on deep neural network pruning: Taxonomy, comparison, analysis, and recommendations

H Cheng, M Zhang, JQ Shi - IEEE Transactions on Pattern …, 2024 - ieeexplore.ieee.org
Modern deep neural networks, particularly recent large language models, come with
massive model sizes that require significant computational and storage resources. To …

H2o: Heavy-hitter oracle for efficient generative inference of large language models

Z Zhang, Y Sheng, T Zhou, T Chen… - Advances in …, 2024 - proceedings.neurips.cc
Abstract Large Language Models (LLMs), despite their recent impressive accomplishments,
are notably cost-prohibitive to deploy, particularly for applications involving long-content …

Perp: Rethinking the prune-retrain paradigm in the era of llms

M Zimmer, M Andoni, C Spiegel, S Pokutta - arXiv preprint arXiv …, 2023 - arxiv.org
Neural Networks can be efficiently compressed through pruning, significantly reducing
storage and computational demands while maintaining predictive performance. Simple yet …

Fast and optimal weight update for pruned large language models

V Boža - arXiv preprint arXiv:2401.02938, 2024 - arxiv.org
Pruning large language models (LLMs) is a challenging task due to their enormous size.
The primary difficulty is fine-tuning the model after pruning, which is needed to recover the …

Pruner-Zero: Evolving Symbolic Pruning Metric from scratch for Large Language Models

P Dong, L Li, Z Tang, X Liu, X Pan, Q Wang… - arXiv preprint arXiv …, 2024 - arxiv.org
Despite the remarkable capabilities, Large Language Models (LLMs) face deployment
challenges due to their extensive size. Pruning methods drop a subset of weights to …

Composable Interventions for Language Models

A Kolbeinsson, K O'Brien, T Huang, S Gao… - arXiv preprint arXiv …, 2024 - arxiv.org
Test-time interventions for language models can enhance factual accuracy, mitigate harmful
outputs, and improve model efficiency without costly retraining. But despite a flood of new …

MaskLLM: Learnable Semi-Structured Sparsity for Large Language Models

G Fang, H Yin, S Muralidharan, G Heinrich… - arXiv preprint arXiv …, 2024 - arxiv.org
Large Language Models (LLMs) are distinguished by their massive parameter counts, which
typically result in significant redundancy. This work introduces MaskLLM, a learnable …

Inference Optimization of Foundation Models on AI Accelerators

Y Park, K Budhathoki, L Chen, JM Kübler… - Proceedings of the 30th …, 2024 - dl.acm.org
Powerful foundation models, including large language models (LLMs), with Transformer
architectures have ushered in a new era of Generative AI across various industries. Industry …

Optimization-based Structural Pruning for Large Language Models without Back-Propagation

Y Gao, Z Liu, W Zhang, B Du, GS Xia - arXiv preprint arXiv:2406.10576, 2024 - arxiv.org
Compared to the moderate size of neural network models, structural weight pruning on the
Large-Language Models (LLMs) imposes a novel challenge on the efficiency of the pruning …