related:WIZt1rdknBcJ:scholar.google.com/

Accelerating transformer-based deep learning models on fpgas using column balanced block pruning

H Peng, S Huang, T Geng, A Li, W Jiang… - … on Quality Electronic …, 2021 - ieeexplore.ieee.org

Although Transformer-based language representations achieve state-of-the-art accuracy on
various natural language processing (NLP) tasks, the large model size has been …

被引用次数：81 相关文章所有 6 个版本

Accommodating transformer onto fpga: Coupling the balanced model compression and fpga-implementation optimization

P Qi, Y Song, H Peng, S Huang, Q Zhuge… - Proceedings of the 2021 …, 2021 - dl.acm.org

Recently, Transformers gradually gain popularity and perform outstanding for many Natural
Language Processing (NLP) tasks. However, Transformers suffer from heavy computation …

被引用次数：40 相关文章

[PDF] nsf.gov

A length adaptive algorithm-hardware co-design of transformer on fpga through sparse attention and dynamic pipelining

H Peng, S Huang, S Chen, B Li, T Geng, A Li… - Proceedings of the 59th …, 2022 - dl.acm.org

Transformers are considered one of the most important deep learning models since 2018, in
part because it establishes state-of-the-art (SOTA) records and could potentially replace …

被引用次数：43 相关文章所有 6 个版本

[PDF] arxiv.org

Hardware acceleration of fully quantized bert for efficient natural language processing

Z Liu, G Li, J Cheng - 2021 Design, Automation & Test in …, 2021 - ieeexplore.ieee.org

BERT is the most recent Transformer-based model that achieves state-of-the-art
performance in various NLP tasks. In this paper, we investigate the hardware acceleration of …

被引用次数：45 相关文章所有 4 个版本

[PDF] acm.org

Et: re-thinking self-attention for transformer models on gpus

S Chen, S Huang, S Pandey, B Li, GR Gao… - Proceedings of the …, 2021 - dl.acm.org

Transformer-based deep learning models have become a ubiquitous vehicle to drive a
variety of Natural Language Processing (NLP) related tasks beyond their accuracy ceiling …

被引用次数：34 相关文章所有 9 个版本

[PDF] github.io

Efficient and effective sparse LSTM on FPGA with bank-balanced sparsity

S Cao, C Zhang, Z Yao, W Xiao, L Nie, D Zhan… - Proceedings of the …, 2019 - dl.acm.org

Neural networks based on Long Short-Term Memory (LSTM) are widely deployed in latency-
sensitive language and speech applications. To speed up LSTM inference, previous …

被引用次数：198 相关文章所有 6 个版本

[PDF] arxiv.org

Ftrans: energy-efficient acceleration of transformers using fpga

B Li, S Pandey, H Fang, Y Lyv, J Li, J Chen… - Proceedings of the …, 2020 - dl.acm.org

In natural language processing (NLP), the" Transformer" architecture was proposed as the
first transduction model replying entirely on self-attention mechanisms without using …

被引用次数：137 相关文章所有 4 个版本

[PDF] arxiv.org

Q8bert: Quantized 8bit bert

O Zafrir, G Boudoukh, P Izsak… - 2019 Fifth Workshop on …, 2019 - ieeexplore.ieee.org

Recently, pre-trained Transformer [1] based language models such as BERT [2] and GPT [3],
have shown great improvement in many Natural Language Processing (NLP) tasks …

被引用次数：514 相关文章所有 7 个版本

[PDF] arxiv.org

Hardware accelerator for multi-head attention and position-wise feed-forward in the transformer

S Lu, M Wang, S Liang, J Lin… - 2020 IEEE 33rd …, 2020 - ieeexplore.ieee.org

Designing hardware accelerators for deep neural networks (DNNs) has been much desired.
Nonetheless, most of these existing accelerators are built for either convolutional neural …

被引用次数：83 相关文章所有 4 个版本

[PDF] acm.org

ReTransformer: ReRAM-based processing-in-memory architecture for transformer acceleration

X Yang, B Yan, H Li, Y Chen - … of the 39th International Conference on …, 2020 - dl.acm.org

Transformer has emerged as a popular deep neural network (DNN) model for Neural
Language Processing (NLP) applications and demonstrated excellent performance in …

被引用次数：81 相关文章所有 3 个版本

高级搜索

QQ 群

Accelerating transformer-based deep learning models on fpgas using column balanced block pruning

Accommodating transformer onto fpga: Coupling the balanced model compression and fpga-implementation optimization

A length adaptive algorithm-hardware co-design of transformer on fpga through sparse attention and dynamic pipelining

Hardware acceleration of fully quantized bert for efficient natural language processing

Et: re-thinking self-attention for transformer models on gpus

Efficient and effective sparse LSTM on FPGA with bank-balanced sparsity

Ftrans: energy-efficient acceleration of transformers using fpga

Q8bert: Quantized 8bit bert

Hardware accelerator for multi-head attention and position-wise feed-forward in the transformer

ReTransformer: ReRAM-based processing-in-memory architecture for transformer acceleration

引用