Efficient acceleration of deep learning inference on resource-constrained edge devices: A review

MMH Shuvo, SK Islam, J Cheng… - Proceedings of the …, 2022 - ieeexplore.ieee.org
Successful integration of deep neural networks (DNNs) or deep learning (DL) has resulted
in breakthroughs in many areas. However, deploying these highly accurate models for data …

Admm-nn: An algorithm-hardware co-design framework of dnns using alternating direction methods of multipliers

A Ren, T Zhang, S Ye, J Li, W Xu, X Qian, X Lin… - Proceedings of the …, 2019 - dl.acm.org
Model compression is an important technique to facilitate efficient embedded and hardware
implementations of deep neural networks (DNNs), a number of prior works are dedicated to …

An overview of energy-efficient hardware accelerators for on-device deep-neural-network training

J Lee, HJ Yoo - IEEE Open Journal of the Solid-State Circuits …, 2021 - ieeexplore.ieee.org
Deep Neural Networks (DNNs) have been widely used in various artificial intelligence (AI)
applications due to their overwhelming performance. Furthermore, recently, several …

7.7 LNPU: A 25.3 TFLOPS/W sparse deep-neural-network learning processor with fine-grained mixed precision of FP8-FP16

J Lee, J Lee, D Han, J Lee, G Park… - 2019 IEEE International …, 2019 - ieeexplore.ieee.org
Recently, deep neural network (DNN) hardware accelerators have been reported for energy-
efficient deep learning (DL) acceleration [1–6]. Most prior DNN inference accelerators are …

Accelerating neural network inference on FPGA-based platforms—A survey

R Wu, X Guo, J Du, J Li - Electronics, 2021 - mdpi.com
The breakthrough of deep learning has started a technological revolution in various areas
such as object identification, image/video recognition and semantic segmentation. Neural …

Hardware acceleration of sparse and irregular tensor computations of ml models: A survey and insights

S Dave, R Baghdadi, T Nowatzki… - Proceedings of the …, 2021 - ieeexplore.ieee.org
Machine learning (ML) models are widely used in many important domains. For efficiently
processing these computational-and memory-intensive applications, tensors of these …

A 0.32–128 TOPS, scalable multi-chip-module-based deep neural network inference accelerator with ground-referenced signaling in 16 nm

B Zimmer, R Venkatesan, YS Shao… - IEEE Journal of Solid …, 2020 - ieeexplore.ieee.org
Custom accelerators improve the energy efficiency, area efficiency, and performance of
deep neural network (DNN) inference. This article presents a scalable DNN accelerator …

Non-structured DNN weight pruning—Is it beneficial in any platform?

X Ma, S Lin, S Ye, Z He, L Zhang… - IEEE transactions on …, 2021 - ieeexplore.ieee.org
Large deep neural network (DNN) models pose the key challenge to energy efficiency due
to the significantly higher energy consumption of off-chip DRAM accesses than arithmetic or …

Snap: An efficient sparse neural acceleration processor for unstructured sparse deep neural network inference

JF Zhang, CE Lee, C Liu, YS Shao… - IEEE Journal of Solid …, 2020 - ieeexplore.ieee.org
Recent developments in deep neural network (DNN) pruning introduces data sparsity to
enable deep learning applications to run more efficiently on resourceand energy …

HNPU-V1: An Adaptive DNN Training Processor Utilizing Stochastic Dynamic Fixed-Point and Active Bit-Precision Searching

D Han, HJ Yoo - On-Chip Training NPU-Algorithm, Architecture and …, 2023 - Springer
This chapter presents HNPU, which is an energy-efficient DNN training processor by
adopting algorithm–hardware co-design. The HNPU supports stochastic dynamic fixed-point …