Nn-lut: neural approximation of non-linear operations for efficient transformer inference

S Kim, A Gholami, A Shaw, N Lee… - Advances in …, 2022 - proceedings.neurips.cc

The recently proposed Conformer model has become the de facto backbone model for
various downstream speech tasks based on its hybrid attention-convolution architecture that …

被引用次数：130 相关文章所有 7 个版本

[PDF] arxiv.org

Full stack optimization of transformer inference: a survey

S Kim, C Hooper, T Wattanawong, M Kang… - arXiv preprint arXiv …, 2023 - arxiv.org

Recent advances in state-of-the-art DNN architecture design have been moving toward
Transformer models. These models achieve superior accuracy across a wide range of …

被引用次数：90 相关文章所有 4 个版本

A 95.6-TOPS/W deep learning inference accelerator with per-vector scaled 4-bit quantization in 5 nm

B Keller, R Venkatesan, S Dai, SG Tell… - IEEE Journal of Solid …, 2023 - ieeexplore.ieee.org

The energy efficiency of deep neural network (DNN) inference can be improved with custom
accelerators. DNN inference accelerators often employ specialized hardware techniques to …

被引用次数：28 相关文章所有 2 个版本

[PDF] acm.org

Approximate computing and the efficient machine learning expedition

J Henkel, H Li, A Raghunathan, MB Tahoori… - Proceedings of the 41st …, 2022 - dl.acm.org

Approximate computing (AxC) has been long accepted as a design alternative for efficient
system implementation at the cost of relaxed accuracy requirements. Despite the AxC …

被引用次数：17 相关文章所有 7 个版本

A Mixed-Precision Transformer Accelerator With Vector Tiling Systolic Array for License Plate Recognition in Unconstrained Scenarios

J Li, D Yan, F He, Z Dong… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org

Power efficiency for license plate recognition (LPR) under unconstrained scenarios is a
crucial factor in many edge-based real-world applications, eg, autonomous vehicles whose …

被引用次数：1 相关文章所有 2 个版本

SOLE: Hardware-Software Co-design of Softmax and LayerNorm for Efficient Transformer Inference

W Wang, S Zhou, W Sun, P Sun… - 2023 IEEE/ACM …, 2023 - ieeexplore.ieee.org

Transformers have shown remarkable performance in both natural language processing
(NLP) and computer vision (CV) tasks. However, their real-time inference speed and …

被引用次数：7 相关文章

[PDF] arxiv.org

Enabling and accelerating dynamic vision transformer inference for real-time applications

K Sreedhar, J Clemons, R Venkatesan… - arXiv preprint arXiv …, 2022 - arxiv.org

Many state-of-the-art deep learning models for computer vision tasks are based on the
transformer architecture. Such models can be computationally expensive and are typically …

被引用次数：5 相关文章所有 2 个版本

[PDF] arxiv.org

Genetic quantization-aware approximation for non-linear operations in Transformers

P Dong, Y Tan, D Zhang, T Ni, X Liu, Y Liu… - Proceedings of the 61st …, 2024 - dl.acm.org

Non-linear functions are prevalent in Transformers and their lightweight variants, incurring
substantial and frequently underestimated hardware costs. Previous state-of-the-art works …

被引用次数：1 相关文章所有 2 个版本

ViTA: A Highly Efficient Dataflow and Architecture for Vision Transformers

C Chen, L Li, MMS Aly - 2024 Design, Automation & Test in …, 2024 - ieeexplore.ieee.org

Transformer-based DNNs have dominated several AI fields with remarkable performance.
However, the scaling up of Transformer models up to trillions of parameters and computation …

被引用次数：3 相关文章

Auto-LUT: Auto Approximation of Non-Linear Operations for Neural Networks on FPGA

H Lu, Q Mei, K Wang - 2023 IEEE International Symposium on …, 2023 - ieeexplore.ieee.org

The approximation of non-linear operation can simplify the logic design and save the system
resources during the neural network inference on Field-Programmable Gate Array (FPGA) …

被引用次数：4 相关文章

高级搜索

QQ 群