A survey on deep neural network pruning: Taxonomy, comparison, analysis, and recommendations

H Cheng, M Zhang, JQ Shi - IEEE Transactions on Pattern …, 2024 - ieeexplore.ieee.org
Modern deep neural networks, particularly recent large language models, come with
massive model sizes that require significant computational and storage resources. To …

Beyond efficiency: A systematic survey of resource-efficient large language models

G Bai, Z Chai, C Ling, S Wang, J Lu, N Zhang… - arXiv preprint arXiv …, 2024 - arxiv.org
The burgeoning field of Large Language Models (LLMs), exemplified by sophisticated
models like OpenAI's ChatGPT, represents a significant advancement in artificial …

A survey of FPGA and ASIC designs for transformer inference acceleration and optimization

BJ Kang, HI Lee, SK Yoon, YC Kim, SB Jeong… - Journal of Systems …, 2024 - Elsevier
Recently, transformer-based models have achieved remarkable success in various fields,
such as computer vision, speech recognition, and natural language processing. However …

A Heterogeneous Chiplet Architecture for Accelerating End-to-End Transformer Models

H Sharma, P Dhingra, JR Doppa, U Ogras… - arXiv preprint arXiv …, 2023 - arxiv.org
Transformers have revolutionized deep learning and generative modeling, enabling
unprecedented advancements in natural language processing tasks. However, the size of …

Hardware-software co-design enabling static and dynamic sparse attention mechanisms

J Zhao, P Zeng, G Shen, Q Chen… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
The attention mechanisms of transformers effectively extract pertinent information from the
input sequence. However, the quadratic complexity of self-attention incurs heavy …

FedSpaLLM: Federated Pruning of Large Language Models

G Bai, Y Li, Z Li, L Zhao, K Kim - arXiv preprint arXiv:2410.14852, 2024 - arxiv.org
Large Language Models (LLMs) achieve state-of-the-art performance but are challenging to
deploy due to their high computational and storage demands. Pruning can reduce model …

EdgeTran: Device-aware co-search of transformers for efficient inference on mobile edge platforms

S Tuli, NK Jha - IEEE Transactions on Mobile Computing, 2023 - ieeexplore.ieee.org
Automated design of efficient transformer models has recently attracted significant attention
from industry and academia. However, most works only focus on certain metrics while …

A Review on Edge Large Language Models: Design, Execution, and Applications

Y Zheng, Y Chen, B Qian, X Shi, Y Shu… - arXiv preprint arXiv …, 2024 - arxiv.org
Large language models (LLMs) have revolutionized natural language processing with their
exceptional capabilities. However, deploying LLMs on resource-constrained edge devices …

A survey on sparsity exploration in transformer-based accelerators

KAA Fuad, L Chen - Electronics, 2023 - mdpi.com
Transformer models have emerged as the state-of-the-art in many natural language
processing and computer vision applications due to their capability of attending to longer …

EdgeTran: Co-designing transformers for efficient inference on mobile edge platforms

S Tuli, NK Jha - arXiv preprint arXiv:2303.13745, 2023 - arxiv.org
Automated design of efficient transformer models has recently attracted significant attention
from industry and academia. However, most works only focus on certain metrics while …