A Survey of Design and Optimization for Systolic Array-based DNN Accelerators

R Xu, S Ma, Y Guo, D Li - ACM Computing Surveys, 2023 - dl.acm.org
In recent years, it has been witnessed that the systolic array is a successful architecture for
DNN hardware accelerators. However, the design of systolic arrays also encountered many …

Sparseloop: An analytical approach to sparse tensor accelerator modeling

YN Wu, PA Tsai, A Parashar, V Sze… - 2022 55th IEEE/ACM …, 2022 - ieeexplore.ieee.org
In recent years, many accelerators have been proposed to efficiently process sparse tensor
algebra applications (eg, sparse neural networks). However, these proposals are single …

Flexagon: A multi-dataflow sparse-sparse matrix multiplication accelerator for efficient dnn processing

F Muñoz-Martínez, R Garg, M Pellauer… - Proceedings of the 28th …, 2023 - dl.acm.org
Sparsity is a growing trend in modern DNN models. Existing Sparse-Sparse Matrix
Multiplication (SpMSpM) accelerators are tailored to a particular SpMSpM dataflow (ie, Inner …

Teaal: A declarative framework for modeling sparse tensor accelerators

N Nayak, TO Odemuyiwa, S Ugare, C Fletcher… - Proceedings of the 56th …, 2023 - dl.acm.org
Over the past few years, the explosion in sparse tensor algebra workloads has led to a
corresponding rise in domain-specific accelerators to service them. Due to the irregularity …

Appcip: Energy-efficient approximate convolution-in-pixel scheme for neural network acceleration

S Tabrizchi, A Nezhadi, S Angizi… - IEEE Journal on …, 2023 - ieeexplore.ieee.org
Nowadays, always-on intelligent and self-powered visual perception systems have gained
considerable attention and are widely used. However, capturing data and analyzing it via a …

Sparse-DySta: Sparsity-Aware Dynamic and Static Scheduling for Sparse Multi-DNN Workloads

H Fan, SI Venieris, A Kouris, N Lane - … of the 56th Annual IEEE/ACM …, 2023 - dl.acm.org
Running multiple deep neural networks (DNNs) in parallel has become an emerging
workload in both edge devices, such as mobile phones where multiple tasks serve a single …

MR-PIPA: An Integrated Multilevel RRAM (HfOx)-Based Processing-In-Pixel Accelerator

M Abedin, A Roohi, M Liehr, N Cady… - IEEE Journal on …, 2022 - ieeexplore.ieee.org
This work paves the way to realize a processing-in-pixel (PIP) accelerator based on a
multilevel HfOx resistive random access memory (RRAM) as a flexible, energy-efficient, and …

Taskfusion: An efficient transfer learning architecture with dual delta sparsity for multi-task natural language processing

Z Fan, Q Zhang, P Abillama, S Shoouri, C Lee… - Proceedings of the 50th …, 2023 - dl.acm.org
The combination of pre-trained models and task-specific fine-tuning schemes, such as
BERT, has achieved great success in various natural language processing (NLP) tasks …

Secda: Efficient hardware/software co-design of fpga-based dnn accelerators for edge inference

J Haris, P Gibson, J Cano, NB Agostini… - 2021 IEEE 33rd …, 2021 - ieeexplore.ieee.org
Edge computing devices inherently face tight resource constraints, which is especially
apparent when deploying Deep Neural Networks (DNN) with high memory and compute …

Zero and narrow-width value-aware compression for quantized convolutional neural networks

M Jang, J Kim, H Nam, S Kim - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Convolutional neural networks are normally used in systems with dedicated neural
processing units for CNN-related computations. For high performance and low hardware …