Flightllm: Efficient large language model inference with a complete mapping flow on fpgas

Z Yuan, Y Shang, Y Zhou, Z Dong, Z Zhou… - arXiv preprint arXiv …, 2024 - arxiv.org

The field of efficient Large Language Model (LLM) inference is rapidly evolving, presenting a
unique blend of opportunities and challenges. Although the field has expanded and is …

被引用次数：53 相关文章所有 2 个版本

[PDF] arxiv.org

Large language model inference acceleration: A comprehensive hardware perspective

J Li, J Xu, S Huang, Y Chen, W Li, J Liu, Y Lian… - arXiv preprint arXiv …, 2024 - arxiv.org

Large Language Models (LLMs) have demonstrated remarkable capabilities across various
fields, from natural language understanding to text generation. Compared to non-generative …

被引用次数：6 相关文章所有 3 个版本

Resource-efficient Algorithms and Systems of Foundation Models: A Survey

M Xu, D Cai, W Yin, S Wang, X Jin, X Liu - ACM Computing Surveys, 2025 - dl.acm.org

Large foundation models, including large language models, vision transformers, diffusion,
and large language model based multimodal models, are revolutionizing the entire machine …

[PDF] arxiv.org

A survey on efficient inference for large language models

Z Zhou, X Ning, K Hong, T Fu, J Xu, S Li, Y Lou… - arXiv preprint arXiv …, 2024 - arxiv.org

Large Language Models (LLMs) have attracted extensive attention due to their remarkable
performance across various tasks. However, the substantial computational and memory …

被引用次数：72 相关文章所有 5 个版本

[PDF] acm.org

New solutions on LLM acceleration, optimization, and application

Y Huang, LJ Wan, H Ye, M Jha, J Wang, Y Li… - Proceedings of the 61st …, 2024 - dl.acm.org

Large Language Models (LLMs) have revolutionized a wide range of applications with their
strong human-like understanding and creativity. Due to the continuously growing model size …

被引用次数：8 相关文章

[PDF] arxiv.org

Llamaf: An efficient llama2 architecture accelerator on embedded fpgas

H Xu, Y Li, S Ji - 2024 IEEE 10th World Forum on Internet of …, 2024 - ieeexplore.ieee.org

Large language models (LLMs) have demonstrated remarkable abilities in natural language
processing. However, their deployment on resource-constrained embedded devices …

被引用次数：8 相关文章所有 4 个版本

[PDF] techrxiv.org

Efficient training and inference: Techniques for large language models using llama

SR Cunningham, D Archambault, A Kung - Authorea Preprints, 2024 - techrxiv.org

To enhance the efficiency of language models, it would involve optimizing their training and
inference processes to reduce computational demands while maintaining high performance …

被引用次数：62 相关文章所有 3 个版本

[PDF] arxiv.org

Edgellm: A highly efficient cpu-fpga heterogeneous edge accelerator for large language models

M Huang, A Shen, K Li, H Peng, B Li, H Yu - arXiv preprint arXiv …, 2024 - arxiv.org

The rapid advancements in artificial intelligence (AI), particularly the Large Language
Models (LLMs), have profoundly affected our daily work and communication forms …

被引用次数：3 相关文章所有 2 个版本

A survey of small language models

C Van Nguyen, X Shen, R Aponte, Y Xia… - arXiv preprint arXiv …, 2024 - arxiv.org

Small Language Models (SLMs) have become increasingly important due to their efficiency
and performance to perform various language tasks with minimal computational resources …

被引用次数：3 相关文章所有 3 个版本

[PDF] arxiv.org

FAMOUS: Flexible Accelerator for the Attention Mechanism of Transformer on UltraScale+ FPGAs

E Kabir, MA Kabir, ARJ Downey, JD Bakos… - arXiv preprint arXiv …, 2024 - arxiv.org

Transformer neural networks (TNNs) are being applied across a widening range of
application domains, including natural language processing (NLP), machine translation, and …

被引用次数：3 相关文章所有 2 个版本

高级搜索

QQ 群