Accelerating transformer-based deep learning models on fpgas using column balanced block pruning

H Peng, R Ran, Y Luo, J Zhao… - Advances in …, 2024 - proceedings.neurips.cc

Abstract The growth of Graph Convolution Network (GCN) model sizes has revolutionized
numerous applications, surpassing human performance in areas such as personal …

被引用次数：18 相关文章所有 7 个版本

[PDF] acm.org

Understanding the potential of fpga-based spatial acceleration for large language model inference

H Chen, J Zhang, Y Du, S Xiang, Z Yue… - ACM Transactions on …, 2024 - dl.acm.org

Recent advancements in large language models (LLMs) boasting billions of parameters
have generated a significant demand for efficient deployment in inference workloads. While …

被引用次数：9 相关文章所有 4 个版本

[PDF] acm.org

A Review on the emerging technology of TinyML

V Tsoukas, A Gkogkidis, E Boumpa… - ACM Computing …, 2024 - dl.acm.org

Tiny Machine Learning (TinyML) is an emerging technology proposed by the scientific
community for developing autonomous and secure devices that can gather, process, and …

A survey of FPGA and ASIC designs for transformer inference acceleration and optimization

BJ Kang, HI Lee, SK Yoon, YC Kim, SB Jeong… - Journal of Systems …, 2024 - Elsevier

Recently, transformer-based models have achieved remarkable success in various fields,
such as computer vision, speech recognition, and natural language processing. However …

[PDF] arxiv.org

Edgellm: A highly efficient cpu-fpga heterogeneous edge accelerator for large language models

M Huang, A Shen, K Li, H Peng, B Li, H Yu - arXiv preprint arXiv …, 2024 - arxiv.org

The rapid advancements in artificial intelligence (AI), particularly the Large Language
Models (LLMs), have profoundly affected our daily work and communication forms …

被引用次数：2 相关文章所有 2 个版本

[PDF] arxiv.org

A survey on hardware accelerators for large language models

C Kachris - arXiv preprint arXiv:2401.09890, 2024 - arxiv.org

Large Language Models (LLMs) have emerged as powerful tools for natural language
processing tasks, revolutionizing the field with their ability to understand and generate …

被引用次数：9 相关文章所有 2 个版本

[PDF] aaai.org

Resource Efficient Deep Learning Hardware Watermarks with Signature Alignment

J Clements, Y Lao - Proceedings of the AAAI Conference on Artificial …, 2024 - ojs.aaai.org

Deep learning intellectual properties (IPs) are high-value assets that are frequently
susceptible to theft. This vulnerability has led to significant interest in defending the field's …

[PDF] arxiv.org

A Cost-Efficient FPGA Implementation of Tiny Transformer Model using Neural ODE

I Okubo, K Sugiura, H Matsutani - arXiv preprint arXiv:2401.02721, 2024 - arxiv.org

Transformer is an emerging neural network model with attention mechanism. It has been
adopted to various tasks and achieved a favorable accuracy compared to CNNs and RNNs …

被引用次数：2 相关文章所有 2 个版本

A theoretical and empirical exploration of TileTrans for effective tile pruning

Y Li, F Ino - Knowledge-Based Systems, 2024 - Elsevier

In this paper, we propose a reparameterization method that is capable of transforming the
attention layer of deep neural networks (DNNs) for reducing the loss of tile pruning. The …

[PDF] ieee.org

Applications of Pruning Methods in Natural Language Processing

M Touheed, U Zubair, D Sabir, A Hassan… - IEEE …, 2024 - ieeexplore.ieee.org

Deep neural networks (DNN) are in high demand because of their widespread applications
in natural language processing, image processing, and a lot of other domains. However …

高级搜索

QQ 群