Efficient acceleration of deep learning inference on resource-constrained edge devices: A review

MMH Shuvo, SK Islam, J Cheng… - Proceedings of the …, 2022 - ieeexplore.ieee.org
Successful integration of deep neural networks (DNNs) or deep learning (DL) has resulted
in breakthroughs in many areas. However, deploying these highly accurate models for data …

FPGA HLS today: successes, challenges, and opportunities

J Cong, J Lau, G Liu, S Neuendorffer, P Pan… - ACM Transactions on …, 2022 - dl.acm.org
The year 2011 marked an important transition for FPGA high-level synthesis (HLS), as it
went from prototyping to deployment. A decade later, in this article, we assess the progress …

Unsupervised anomaly detection with LSTM autoencoders using statistical data-filtering

S Maleki, S Maleki, NR Jennings - Applied Soft Computing, 2021 - Elsevier
To address one of the most challenging industry problems, we develop an enhanced
training algorithm for anomaly detection in unlabelled sequential data such as time-series …

Deep neural network approximation for custom hardware: Where we've been, where we're going

E Wang, JJ Davis, R Zhao, HC Ng, X Niu… - ACM Computing …, 2019 - dl.acm.org
Deep neural networks have proven to be particularly effective in visual and audio
recognition tasks. Existing models tend to be computationally expensive and memory …

Flextensor: An automatic schedule exploration and optimization framework for tensor computation on heterogeneous system

S Zheng, Y Liang, S Wang, R Chen… - Proceedings of the Twenty …, 2020 - dl.acm.org
Tensor computation plays a paramount role in a broad range of domains, including machine
learning, data analytics, and scientific computing. The wide adoption of tensor computation …

Ftrans: energy-efficient acceleration of transformers using fpga

B Li, S Pandey, H Fang, Y Lyv, J Li, J Chen… - Proceedings of the …, 2020 - dl.acm.org
In natural language processing (NLP), the" Transformer" architecture was proposed as the
first transduction model replying entirely on self-attention mechanisms without using …

Efficient and effective sparse LSTM on FPGA with bank-balanced sparsity

S Cao, C Zhang, Z Yao, W Xiao, L Nie, D Zhan… - Proceedings of the …, 2019 - dl.acm.org
Neural networks based on Long Short-Term Memory (LSTM) are widely deployed in latency-
sensitive language and speech applications. To speed up LSTM inference, previous …

A high-speed and low-complexity architecture for softmax function in deep learning

M Wang, S Lu, D Zhu, J Lin… - 2018 IEEE asia pacific …, 2018 - ieeexplore.ieee.org
Recently, significant improvement has been achieved for hardware architecture design of
deep neural networks (DNNs). However, the hardware implementation of one widely used …

Accelerating transformer-based deep learning models on fpgas using column balanced block pruning

H Peng, S Huang, T Geng, A Li, W Jiang… - … on Quality Electronic …, 2021 - ieeexplore.ieee.org
Although Transformer-based language representations achieve state-of-the-art accuracy on
various natural language processing (NLP) tasks, the large model size has been …

Accelerating neural network inference on FPGA-based platforms—A survey

R Wu, X Guo, J Du, J Li - Electronics, 2021 - mdpi.com
The breakthrough of deep learning has started a technological revolution in various areas
such as object identification, image/video recognition and semantic segmentation. Neural …