Lingcn: Structural linearized graph convolutional network for homomorphically encrypted inference

H Peng, R Ran, Y Luo, J Zhao… - Advances in …, 2024 - proceedings.neurips.cc
Abstract The growth of Graph Convolution Network (GCN) model sizes has revolutionized
numerous applications, surpassing human performance in areas such as personal …

Understanding the potential of fpga-based spatial acceleration for large language model inference

H Chen, J Zhang, Y Du, S Xiang, Z Yue… - ACM Transactions on …, 2024 - dl.acm.org
Recent advancements in large language models (LLMs) boasting billions of parameters
have generated a significant demand for efficient deployment in inference workloads. While …

A Review on the emerging technology of TinyML

V Tsoukas, A Gkogkidis, E Boumpa… - ACM Computing …, 2024 - dl.acm.org
Tiny Machine Learning (TinyML) is an emerging technology proposed by the scientific
community for developing autonomous and secure devices that can gather, process, and …

A survey of FPGA and ASIC designs for transformer inference acceleration and optimization

BJ Kang, HI Lee, SK Yoon, YC Kim, SB Jeong… - Journal of Systems …, 2024 - Elsevier
Recently, transformer-based models have achieved remarkable success in various fields,
such as computer vision, speech recognition, and natural language processing. However …

Edgellm: A highly efficient cpu-fpga heterogeneous edge accelerator for large language models

M Huang, A Shen, K Li, H Peng, B Li, H Yu - arXiv preprint arXiv …, 2024 - arxiv.org
The rapid advancements in artificial intelligence (AI), particularly the Large Language
Models (LLMs), have profoundly affected our daily work and communication forms …

A survey on hardware accelerators for large language models

C Kachris - arXiv preprint arXiv:2401.09890, 2024 - arxiv.org
Large Language Models (LLMs) have emerged as powerful tools for natural language
processing tasks, revolutionizing the field with their ability to understand and generate …

Resource Efficient Deep Learning Hardware Watermarks with Signature Alignment

J Clements, Y Lao - Proceedings of the AAAI Conference on Artificial …, 2024 - ojs.aaai.org
Deep learning intellectual properties (IPs) are high-value assets that are frequently
susceptible to theft. This vulnerability has led to significant interest in defending the field's …

A Cost-Efficient FPGA Implementation of Tiny Transformer Model using Neural ODE

I Okubo, K Sugiura, H Matsutani - arXiv preprint arXiv:2401.02721, 2024 - arxiv.org
Transformer is an emerging neural network model with attention mechanism. It has been
adopted to various tasks and achieved a favorable accuracy compared to CNNs and RNNs …

A theoretical and empirical exploration of TileTrans for effective tile pruning

Y Li, F Ino - Knowledge-Based Systems, 2024 - Elsevier
In this paper, we propose a reparameterization method that is capable of transforming the
attention layer of deep neural networks (DNNs) for reducing the loss of tile pruning. The …

Applications of Pruning Methods in Natural Language Processing

M Touheed, U Zubair, D Sabir, A Hassan… - IEEE …, 2024 - ieeexplore.ieee.org
Deep neural networks (DNN) are in high demand because of their widespread applications
in natural language processing, image processing, and a lot of other domains. However …