AccelTran: A sparsity-aware accelerator for dynamic inference with transformers

H Cheng, M Zhang, JQ Shi - IEEE Transactions on Pattern …, 2024 - ieeexplore.ieee.org

Modern deep neural networks, particularly recent large language models, come with
massive model sizes that require significant computational and storage resources. To …

被引用次数：63 相关文章所有 2 个版本

[PDF] arxiv.org

Beyond efficiency: A systematic survey of resource-efficient large language models

G Bai, Z Chai, C Ling, S Wang, J Lu, N Zhang… - arXiv preprint arXiv …, 2024 - arxiv.org

The burgeoning field of Large Language Models (LLMs), exemplified by sophisticated
models like OpenAI's ChatGPT, represents a significant advancement in artificial …

被引用次数：55 相关文章所有 2 个版本

A survey of FPGA and ASIC designs for transformer inference acceleration and optimization

BJ Kang, HI Lee, SK Yoon, YC Kim, SB Jeong… - Journal of Systems …, 2024 - Elsevier

Recently, transformer-based models have achieved remarkable success in various fields,
such as computer vision, speech recognition, and natural language processing. However …

被引用次数：1 相关文章

[PDF] arxiv.org

A Heterogeneous Chiplet Architecture for Accelerating End-to-End Transformer Models

H Sharma, P Dhingra, JR Doppa, U Ogras… - arXiv preprint arXiv …, 2023 - arxiv.org

Transformers have revolutionized deep learning and generative modeling, enabling
unprecedented advancements in natural language processing tasks. However, the size of …

被引用次数：5 相关文章所有 2 个版本

Hardware-software co-design enabling static and dynamic sparse attention mechanisms

J Zhao, P Zeng, G Shen, Q Chen… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org

The attention mechanisms of transformers effectively extract pertinent information from the
input sequence. However, the quadratic complexity of self-attention incurs heavy …

被引用次数：4 相关文章

[PDF] arxiv.org

FedSpaLLM: Federated Pruning of Large Language Models

G Bai, Y Li, Z Li, L Zhao, K Kim - arXiv preprint arXiv:2410.14852, 2024 - arxiv.org

Large Language Models (LLMs) achieve state-of-the-art performance but are challenging to
deploy due to their high computational and storage demands. Pruning can reduce model …

被引用次数：1 相关文章所有 2 个版本

EdgeTran: Device-aware co-search of transformers for efficient inference on mobile edge platforms

S Tuli, NK Jha - IEEE Transactions on Mobile Computing, 2023 - ieeexplore.ieee.org

Automated design of efficient transformer models has recently attracted significant attention
from industry and academia. However, most works only focus on certain metrics while …

被引用次数：5 相关文章所有 4 个版本

[PDF] arxiv.org

A Review on Edge Large Language Models: Design, Execution, and Applications

Y Zheng, Y Chen, B Qian, X Shi, Y Shu… - arXiv preprint arXiv …, 2024 - arxiv.org

Large language models (LLMs) have revolutionized natural language processing with their
exceptional capabilities. However, deploying LLMs on resource-constrained edge devices …

被引用次数：1 相关文章所有 2 个版本

[PDF] mdpi.com

A survey on sparsity exploration in transformer-based accelerators

KAA Fuad, L Chen - Electronics, 2023 - mdpi.com

Transformer models have emerged as the state-of-the-art in many natural language
processing and computer vision applications due to their capability of attending to longer …

被引用次数：5 相关文章所有 5 个版本

[PDF] arxiv.org

EdgeTran: Co-designing transformers for efficient inference on mobile edge platforms

S Tuli, NK Jha - arXiv preprint arXiv:2303.13745, 2023 - arxiv.org

Automated design of efficient transformer models has recently attracted significant attention
from industry and academia. However, most works only focus on certain metrics while …

被引用次数：5 相关文章所有 2 个版本

高级搜索

QQ 群