A Survey of Design and Optimization for Systolic Array-based DNN Accelerators

R Xu, S Ma, Y Guo, D Li - ACM Computing Surveys, 2023 - dl.acm.org
In recent years, it has been witnessed that the systolic array is a successful architecture for
DNN hardware accelerators. However, the design of systolic arrays also encountered many …

Sparseloop: An analytical approach to sparse tensor accelerator modeling

YN Wu, PA Tsai, A Parashar, V Sze… - 2022 55th IEEE/ACM …, 2022 - ieeexplore.ieee.org
In recent years, many accelerators have been proposed to efficiently process sparse tensor
algebra applications (eg, sparse neural networks). However, these proposals are single …

Flexagon: A multi-dataflow sparse-sparse matrix multiplication accelerator for efficient dnn processing

F Muñoz-Martínez, R Garg, M Pellauer… - Proceedings of the 28th …, 2023 - dl.acm.org
Sparsity is a growing trend in modern DNN models. Existing Sparse-Sparse Matrix
Multiplication (SpMSpM) accelerators are tailored to a particular SpMSpM dataflow (ie, Inner …

Teaal: A declarative framework for modeling sparse tensor accelerators

N Nayak, TO Odemuyiwa, S Ugare, C Fletcher… - Proceedings of the 56th …, 2023 - dl.acm.org
Over the past few years, the explosion in sparse tensor algebra workloads has led to a
corresponding rise in domain-specific accelerators to service them. Due to the irregularity …

Appcip: Energy-efficient approximate convolution-in-pixel scheme for neural network acceleration

S Tabrizchi, A Nezhadi, S Angizi… - IEEE Journal on …, 2023 - ieeexplore.ieee.org
Nowadays, always-on intelligent and self-powered visual perception systems have gained
considerable attention and are widely used. However, capturing data and analyzing it via a …

Sparse-DySta: Sparsity-Aware Dynamic and Static Scheduling for Sparse Multi-DNN Workloads

H Fan, SI Venieris, A Kouris, N Lane - … of the 56th Annual IEEE/ACM …, 2023 - dl.acm.org
Running multiple deep neural networks (DNNs) in parallel has become an emerging
workload in both edge devices, such as mobile phones where multiple tasks serve a single …

MR-PIPA: An Integrated Multilevel RRAM (HfOx)-Based Processing-In-Pixel Accelerator

M Abedin, A Roohi, M Liehr, N Cady… - IEEE Journal on …, 2022 - ieeexplore.ieee.org
This work paves the way to realize a processing-in-pixel (PIP) accelerator based on a
multilevel HfOx resistive random access memory (RRAM) as a flexible, energy-efficient, and …

From cnn to dnn hardware accelerators: A survey on design, exploration, simulation, and frameworks

LR Juracy, R Garibotti, FG Moraes - Foundations and Trends® …, 2023 - nowpublishers.com
Over the past decade, a massive proliferation of machine learning algorithms has emerged,
from applications for surveillance to self-driving cars. The turning point occurred with the …

Taskfusion: An efficient transfer learning architecture with dual delta sparsity for multi-task natural language processing

Z Fan, Q Zhang, P Abillama, S Shoouri, C Lee… - Proceedings of the 50th …, 2023 - dl.acm.org
The combination of pre-trained models and task-specific fine-tuning schemes, such as
BERT, has achieved great success in various natural language processing (NLP) tasks …

UNICO: Unified Hardware Software Co-Optimization for Robust Neural Network Acceleration

B Rashidi, C Gao, S Lu, Z Wang, C Zhou… - Proceedings of the 56th …, 2023 - dl.acm.org
Specialized hardware has become an indispensable component to deep neural network
(DNN) acceleration. To keep up with the rapid evolution of neural networks, holistic and …