Towards efficient sparse matrix vector multiplication on real processing-in-memory architectures

C Giannoula, I Fernandez, J Gómez-Luna… - ACM SIGMETRICS …, 2022 - dl.acm.org
Several manufacturers have already started to commercialize near-bank Processing-In-
Memory (PIM) architectures, after decades of research efforts. Near-bank PIM architectures …

Ant: Exploiting adaptive numerical data type for low-bit deep neural network quantization

C Guo, C Zhang, J Leng, Z Liu, F Yang… - 2022 55th IEEE/ACM …, 2022 - ieeexplore.ieee.org
Quantization is a technique to reduce the computation and memory cost of DNN models,
which are getting increasingly large. Existing quantization solutions use fixed-point integer …

Sparse attention acceleration with synergistic in-memory pruning and on-chip recomputation

A Yazdanbakhsh, A Moradifirouzabadi… - 2022 55th IEEE/ACM …, 2022 - ieeexplore.ieee.org
As its core computation, a self-attention mechanism gauges pairwise correlations across the
entire input sequence. Despite favorable performance, calculating pairwise correlations is …

On the accuracy of analog neural network inference accelerators

TP Xiao, B Feinberg, CH Bennett… - IEEE Circuits and …, 2022 - ieeexplore.ieee.org
Specialized accelerators have recently garnered attention as a method to reduce the power
consumption of neural network inference. A promising category of accelerators utilizes …

Genpip: In-memory acceleration of genome analysis via tight integration of basecalling and read mapping

H Mao, M Alser, M Sadrosadati, C Firtina… - 2022 55th IEEE/ACM …, 2022 - ieeexplore.ieee.org
Nanopore sequencing is a widely-used high-throughput genome sequencing technology
that can sequence long fragments of a genome into raw electrical signals at low cost …

Codg-reram: An algorithm-hardware co-design to accelerate semi-structured gnns on reram

Y Luo, P Behnam, K Thorat, Z Liu… - 2022 IEEE 40th …, 2022 - ieeexplore.ieee.org
Graph Neural Networks (GCNs) have attracted wide attention and are applied to the real
world. However, due to the ever-growing graph data with significant irregularities, off-chip …

Accelerating large-scale graph neural network training on crossbar diet

C Ogbogu, AI Arka, BK Joardar… - … on Computer-Aided …, 2022 - ieeexplore.ieee.org
Resistive random-access memory (ReRAM)-based manycore architectures enable
acceleration of graph neural network (GNN) inference and training. GNNs exhibit …

[HTML][HTML] Software systems implementation and domain-specific architectures towards graph analytics

H Jin, H Qi, J Zhao, X Jiang, Y Huang, C Gui… - Intelligent …, 2022 - spj.science.org
Graph analytics, which mainly includes graph processing, graph mining, and graph learning,
has become increasingly important in several domains, including social network analysis …

Achieving the performance of all-bank in-DRAM PIM with standard memory interface: Memory-computation decoupling

Y Paik, CH Kim, WJ Lee, SW Kim - IEEE Access, 2022 - ieeexplore.ieee.org
Processing-in-Memory (PIM) has been actively studied to overcome the memory bottleneck
by placing computing units near or in memory, especially for efficiently processing low …

IVQ: In-memory acceleration of DNN inference exploiting varied quantization

F Liu, W Zhao, Z Wang, Y Zhao, T Yang… - … on Computer-Aided …, 2022 - ieeexplore.ieee.org
Weight quantization is well adapted to cope with the ever-growing complexity of the deep
neural network (DNN) model. Diversified quantization schemes lead to diverse quantized bit …