A survey on deep learning hardware accelerators for heterogeneous hpc platforms

C Silvano, D Ielmini, F Ferrandi, L Fiorin… - arXiv preprint arXiv …, 2023 - arxiv.org
Recent trends in deep learning (DL) imposed hardware accelerators as the most viable
solution for several classes of high-performance computing (HPC) applications such as …

The sparse abstract machine

O Hsu, M Strange, R Sharma, J Won… - Proceedings of the 28th …, 2023 - dl.acm.org
We propose the Sparse Abstract Machine (SAM), an abstract machine model for targeting
sparse tensor algebra to reconfigurable and fixed-function spatial dataflow accelerators …

Symphony: Orchestrating sparse and dense tensors with hierarchical heterogeneous processing

M Pellauer, J Clemons, V Balaji, N Crago… - ACM Transactions on …, 2023 - dl.acm.org
Sparse tensor algorithms are becoming widespread, particularly in the domains of deep
learning, graph and data analytics, and scientific computing. Current high-performance …

Aha: An agile approach to the design of coarse-grained reconfigurable accelerators and compilers

K Koul, J Melchert, K Sreedhar, L Truong… - ACM Transactions on …, 2023 - dl.acm.org
With the slowing of Moore's law, computer architects have turned to domain-specific
hardware specialization to continue improving the performance and efficiency of computing …

Peak: A single source of truth for hardware design and verification

C Donovick, J Melchert, R Daly, L Truong… - ACM Transactions on …, 2023 - dl.acm.org
Domain-specific languages for hardware can significantly enhance designer productivity,
but sometimes at the cost of ease of verification. On the other hand, ISA specification …

The Dataflow Abstract Machine Simulator Framework

N Zhang, R Lacouture, G Sohn, P Mure… - 2024 ACM/IEEE 51st …, 2024 - ieeexplore.ieee.org
The growing interest in novel dataflow architectures and streaming execution paradigms has
created the need for a simulator optimized for modeling dataflow systems. To fill this need …

Sustainable Hardware Specialization

P Dangi, TK Bandara, S Sheikhpour, T Mitra… - arXiv preprint arXiv …, 2024 - arxiv.org
Hardware specialization is commonly viewed as a way to scale performance in the dark
silicon era with modern-day SoCs featuring multiple tens of dedicated accelerators. By only …

DAP: A 507-GMACs/J 256-Core Domain Adaptive Processor for Wireless Communication and Linear Algebra Kernels in 12-nm FINFET

KY Chen, CS Yang, YH Sun, CW Tseng… - IEEE Journal of Solid …, 2024 - ieeexplore.ieee.org
We present domain adaptive processor (), a programmable systolic-array processor
designed for wireless communication and linear algebra workloads. uses a globally …

Efficient open modification spectral library searching in high-dimensional space with multi-level-cell memory

K Fan, WC Chen, S Pinge, HSP Wong… - Proceedings of the 61st …, 2024 - dl.acm.org
Open Modification Search (OMS) is a promising algorithm for mass spectrometry analysis
that enables the discovery of modified peptides. However, OMS encounters challenges as it …

Canal: A flexible interconnect generator for coarse-grained reconfigurable arrays

J Melchert, K Zhang, Y Mei, M Horowitz… - IEEE Computer …, 2023 - ieeexplore.ieee.org
The architecture of a coarse-grained reconfigurable array (CGRA) interconnect has a
significant effect on not only the flexibility of the resulting accelerator, but also its power …