A survey of techniques for optimizing transformer inference

KT Chitty-Venkata, S Mittal, M Emani… - Journal of Systems …, 2023 - Elsevier
Recent years have seen a phenomenal rise in the performance and applications of
transformer neural networks. The family of transformer networks, including Bidirectional …

A review of the optimal design of neural networks based on FPGA

C Wang, Z Luo - Applied Sciences, 2022 - mdpi.com
Deep learning based on neural networks has been widely used in image recognition,
speech recognition, natural language processing, automatic driving, and other fields and …

M³vit: Mixture-of-experts vision transformer for efficient multi-task learning with model-accelerator co-design

Z Fan, R Sarkar, Z Jiang, T Chen… - Advances in …, 2022 - proceedings.neurips.cc
Multi-task learning (MTL) encapsulates multiple learned tasks in a single model and often
lets those tasks learn better jointly. Multi-tasking models have become successful and often …

Autorep: Automatic relu replacement for fast private network inference

H Peng, S Huang, T Zhou, Y Luo… - Proceedings of the …, 2023 - openaccess.thecvf.com
The growth of the Machine-Learning-As-A-Service (MLaaS) market has highlighted clients'
data privacy and security issues. Private inference (PI) techniques using cryptographic …

Lingcn: Structural linearized graph convolutional network for homomorphically encrypted inference

H Peng, R Ran, Y Luo, J Zhao… - Advances in …, 2024 - proceedings.neurips.cc
Abstract The growth of Graph Convolution Network (GCN) model sizes has revolutionized
numerous applications, surpassing human performance in areas such as personal …

An algorithm–hardware co-optimized framework for accelerating n: M sparse transformers

C Fang, A Zhou, Z Wang - IEEE Transactions on Very Large …, 2022 - ieeexplore.ieee.org
The Transformer has been an indispensable staple in deep learning. However, for real-life
applications, it is very challenging to deploy efficient Transformers due to the immense …

A length adaptive algorithm-hardware co-design of transformer on fpga through sparse attention and dynamic pipelining

H Peng, S Huang, S Chen, B Li, T Geng, A Li… - Proceedings of the 59th …, 2022 - dl.acm.org
Transformers are considered one of the most important deep learning models since 2018, in
part because it establishes state-of-the-art (SOTA) records and could potentially replace …

Vaqf: Fully automatic software-hardware co-design framework for low-bit vision transformer

M Sun, H Ma, G Kang, Y Jiang, T Chen, X Ma… - arXiv preprint arXiv …, 2022 - arxiv.org
The transformer architectures with attention mechanisms have obtained success in Nature
Language Processing (NLP), and Vision Transformers (ViTs) have recently extended the …

Via: A novel vision-transformer accelerator based on fpga

T Wang, L Gong, C Wang, Y Yang… - … on Computer-Aided …, 2022 - ieeexplore.ieee.org
Since Google proposed Transformer in 2017, it has made significant natural language
processing (NLP) development. However, the increasing cost is a large amount of …

Towards sparsification of graph neural networks

H Peng, D Gurevin, S Huang, T Geng… - 2022 IEEE 40th …, 2022 - ieeexplore.ieee.org
As real-world graphs expand in size, larger GNN models with billions of parameters are
deployed. High parameter count in such models makes training and inference on graphs …