Shortened llama: A simple depth pruning for large language models

H Cheng, M Zhang, JQ Shi - IEEE Transactions on Pattern …, 2024 - ieeexplore.ieee.org

Modern deep neural networks, particularly recent large language models, come with
massive model sizes that require significant computational and storage resources. To …

被引用次数：102 相关文章所有 2 个版本

[PDF] mit.edu Full-Text @ LOYNO Library

A survey on model compression for large language models

X Zhu, J Li, Y Liu, C Ma, W Wang - Transactions of the Association for …, 2024 - direct.mit.edu

Abstract Large Language Models (LLMs) have transformed natural language processing
tasks successfully. Yet, their large size and high computational needs pose challenges for …

被引用次数：208 相关文章所有 2 个版本

[PDF] openreview.net

Compact language models via pruning and knowledge distillation

S Muralidharan, ST Sreenivas, RB Joshi… - The Thirty-eighth …, 2024 - openreview.net

Large language models (LLMs) targeting different deployment scales and sizes are currently
produced by training each variant from scratch; this is extremely compute-intensive. In this …

被引用次数：28 相关文章所有 6 个版本

[PDF] arxiv.org

Bk-sdm: A lightweight, fast, and cheap version of stable diffusion

BK Kim, HK Song, T Castells, S Choi - European Conference on Computer …, 2025 - Springer

Abstract Text-to-image (T2I) generation with Stable Diffusion models (SDMs) involves high
computing demands due to billion-scale parameters. To enhance efficiency, recent studies …

被引用次数：17 相关文章所有 2 个版本

[PDF] arxiv.org

A Review on Edge Large Language Models: Design, Execution, and Applications

Y Zheng, Y Chen, B Qian, X Shi, Y Shu… - arXiv preprint arXiv …, 2024 - arxiv.org

Large language models (LLMs) have revolutionized natural language processing with their
exceptional capabilities. However, deploying LLMs on resource-constrained edge devices …

被引用次数：2 相关文章所有 2 个版本

[PDF] arxiv.org

Transformer layers as painters

Q Sun, M Pickett, AK Nain, L Jones - arXiv preprint arXiv:2407.09298, 2024 - arxiv.org

Despite their nearly universal adoption for large language models, the internal workings of
transformers are not well understood. We aim to better understand the impact of removing or …

被引用次数：5 相关文章所有 2 个版本

[PDF] arxiv.org

HW-GPT-Bench: Hardware-Aware Architecture Benchmark for Language Models

RS Sukthanker, A Zela, B Staffler, A Klein… - arXiv preprint arXiv …, 2024 - arxiv.org

The increasing size of language models necessitates a thorough analysis across multiple
dimensions to assess trade-offs among crucial hardware metrics such as latency, energy …

被引用次数：2 相关文章所有 2 个版本

[PDF] arxiv.org

Mixture-of-modules: Reinventing transformers as dynamic assemblies of modules

Z Gong, A Lv, J Guan, J Yan, W Wu, H Zhang… - arXiv preprint arXiv …, 2024 - arxiv.org

Is it always necessary to compute tokens from shallow to deep layers in Transformers? The
continued success of vanilla Transformers and their variants suggests an undoubted" yes" …

被引用次数：3 相关文章所有 4 个版本

[PDF] arxiv.org

Laptop-diff: Layer pruning and normalized distillation for compressing diffusion models

D Zhang, S Li, C Chen, Q Xie, H Lu - arXiv preprint arXiv:2404.11098, 2024 - arxiv.org

In the era of AIGC, the demand for low-budget or even on-device applications of diffusion
models emerged. In terms of compressing the Stable Diffusion models (SDMs), several …

被引用次数：13 相关文章所有 2 个版本

[PDF] arxiv.org

A deeper look at depth pruning of LLMs

SA Siddiqui, X Dong, G Heinrich, T Breuel… - arXiv preprint arXiv …, 2024 - arxiv.org

Large Language Models (LLMs) are not only resource-intensive to train but even more
costly to deploy in production. Therefore, recent work has attempted to prune blocks of LLMs …

被引用次数：2 相关文章所有 4 个版本

高级搜索

QQ 群