[HTML][HTML] A survey on hardware accelerators: Taxonomy, trends, challenges, and perspectives

B Peccerillo, M Mannino, A Mondelli… - Journal of Systems …, 2022 - Elsevier
In recent years, the limits of the multicore approach emerged in the so-called “dark silicon”
issue and diminishing returns of an ever-increasing core count. Hardware manufacturers …

A modern primer on processing in memory

O Mutlu, S Ghose, J Gómez-Luna… - … computing: from devices …, 2022 - Springer
Modern computing systems are overwhelmingly designed to move data to computation. This
design choice goes directly against at least three key trends in computing that cause …

Benchmarking a new paradigm: Experimental analysis and characterization of a real processing-in-memory system

J Gómez-Luna, I El Hajj, I Fernandez… - IEEE …, 2022 - ieeexplore.ieee.org
Many modern workloads, such as neural networks, databases, and graph processing, are
fundamentally memory-bound. For such workloads, the data movement between main …

Breaking the von Neumann bottleneck: architecture-level processing-in-memory technology

X Zou, S Xu, X Chen, L Yan, Y Han - Science China Information Sciences, 2021 - Springer
The “memory wall” problem or so-called von Neumann bottleneck limits the efficiency of
conventional computer architectures, which move data from memory to CPU for …

GCNAX: A flexible and energy-efficient accelerator for graph convolutional neural networks

J Li, A Louri, A Karanth… - 2021 IEEE International …, 2021 - ieeexplore.ieee.org
Graph convolutional neural networks (GCNs) have emerged as an effective approach to
extend deep learning for graph data analytics. Given that graphs are usually irregular, as …

Sisa: Set-centric instruction set architecture for graph mining on processing-in-memory systems

M Besta, R Kanakagiri, G Kwasniewski… - MICRO-54: 54th Annual …, 2021 - dl.acm.org
Simple graph algorithms such as PageRank have been the target of numerous hardware
accelerators. Yet, there also exist much more complex graph mining algorithms for problems …

DAMOV: A new methodology and benchmark suite for evaluating data movement bottlenecks

GF Oliveira, J Gómez-Luna, L Orosa, S Ghose… - IEEE …, 2021 - ieeexplore.ieee.org
Data movement between the CPU and main memory is a first-order obstacle against improv
ing performance, scalability, and energy efficiency in modern systems. Computer systems …

Benchmarking a new paradigm: An experimental analysis of a real processing-in-memory architecture

J Gómez-Luna, IE Hajj, I Fernandez… - arXiv preprint arXiv …, 2021 - arxiv.org
Many modern workloads, such as neural networks, databases, and graph processing, are
fundamentally memory-bound. For such workloads, the data movement between main …

Fafnir: Accelerating sparse gathering by using efficient near-memory intelligent reduction

B Asgari, R Hadidi, J Cao, SK Lim… - 2021 IEEE International …, 2021 - ieeexplore.ieee.org
Memory-bound sparse gathering, caused by irregular random memory accesses, has
become an obstacle in several on-demand applications such as embedding lookup in …

Syncron: Efficient synchronization support for near-data-processing architectures

C Giannoula, N Vijaykumar… - … Symposium on High …, 2021 - ieeexplore.ieee.org
Near-Data-Processing (NDP) architectures present a promising way to alleviate data
movement costs and can provide significant performance and energy benefits to parallel …