Spade: A flexible and scalable accelerator for spmm and sddmm

G Gerogiannis, S Yesil, D Lenadora, D Cao… - Proceedings of the 50th …, 2023 - dl.acm.org
The widespread use of Sparse Matrix Dense Matrix Multiplication (SpMM) and Sampled
Dense Matrix Dense Matrix Multiplication (SDDMM) kernels makes them candidates for …

Menda: A near-memory multi-way merge solution for sparse transposition and dataflows

S Feng, X He, KY Chen, L Ke, X Zhang… - Proceedings of the 49th …, 2022 - dl.acm.org
Near-memory processing has been extensively studied to optimize memory intensive
workloads. However, none of the proposed designs address sparse matrix transposition, an …

SparseAdapt: Runtime control for sparse linear algebra on a reconfigurable accelerator

S Pal, A Amarnath, S Feng, M O'Boyle… - MICRO-54: 54th Annual …, 2021 - dl.acm.org
Dynamic adaptation is a post-silicon optimization technique that adapts the hardware to
workload phases. However, current adaptive approaches are oblivious to implicit phases …

Special session: Towards an agile design methodology for efficient, reliable, and secure ML systems

S Dave, A Marchisio, MA Hanif… - 2022 IEEE 40th VLSI …, 2022 - ieeexplore.ieee.org
The real-world use cases of Machine Learning (ML) have exploded over the past few years.
However, the current computing infrastructure is insufficient to support all real-world …

Cosparse: A software and hardware reconfigurable spmv framework for graph analytics

S Feng, J Sun, S Pal, X He, K Kaszyk… - 2021 58th ACM/IEEE …, 2021 - ieeexplore.ieee.org
Sparse matrix-vector multiplication (SpMV) is a critical building block for iterative graph
analytics algorithms. Typically, such algorithms have a varying active vertex set across …

Versa: A 36-core systolic multiprocessor with dynamically reconfigurable interconnect and memory

S Kim, M Fayazi, A Daftardar, KY Chen… - IEEE Journal of Solid …, 2022 - ieeexplore.ieee.org
We present Versa, an energy-efficient 36-core systolic multiprocessor with dynamically
reconfigurable interconnects and memory. Versa leverages reconfigurable functional units …

An adjustable farthest point sampling method for approximately-sorted point cloud data

J Li, J Zhou, Y Xiong, X Chen… - 2022 IEEE Workshop …, 2022 - ieeexplore.ieee.org
Sampling is an essential part of raw point cloud data processing, such as in the popular
PointNet++ scheme. Farthest Point Sampling (FPS), which iteratively samples the farthest …

Blocks: Challenging simds and vliws with a reconfigurable architecture

M Wijtvliet, A Kumar, H Corporaal - IEEE Transactions on …, 2021 - ieeexplore.ieee.org
Demand for coarse grain reconfigurable architectures (CGRAs) has significantly increased
in recent years as architectures need to be both energy efficient and flexible. However, most …

OnSRAM: Efficient inter-node on-chip scratchpad management in deep learning accelerators

S Pal, S Venkataramani, V Srinivasan… - ACM Transactions on …, 2022 - dl.acm.org
Hardware acceleration of Artificial Intelligence (AI) workloads has gained widespread
popularity with its potential to deliver unprecedented performance and efficiency. An …

Rewriting History: Repurposing Domain-Specific CGRAs

J Woodruff, T Koehler, A Brauckmann… - arXiv preprint arXiv …, 2023 - arxiv.org
Coarse-grained reconfigurable arrays (CGRAs) are domain-specific devices promising both
the flexibility of FPGAs and the performance of ASICs. However, with restricted domains …