Pinatubo: A processing-in-memory architecture for bulk bitwise operations in emerging non-volatile memories

S Li, C Xu, Q Zou, J Zhao, Y Lu, Y Xie - Proceedings of the 53rd Annual …, 2016 - dl.acm.org
Processing-in-memory (PIM) provides high bandwidth, massive parallelism, and high
energy efficiency by implementing computations in main memory, therefore eliminating the …

Compute caches

S Aga, S Jeloka, A Subramaniyan… - … Symposium on High …, 2017 - ieeexplore.ieee.org
This paper presents the Compute Cache architecture that enables in-place computation in
caches. Compute Caches uses emerging bit-line SRAM circuit technology to re-purpose …

Computing in memory with spin-transfer torque magnetic RAM

S Jain, A Ranjan, K Roy… - IEEE Transactions on …, 2017 - ieeexplore.ieee.org
In-memory computing is a promising approach to addressing the processor-memory data
transfer bottleneck in computing systems. We propose spin-transfer torque compute-in …

Processing-in-memory: A workload-driven perspective

S Ghose, A Boroumand, JS Kim… - IBM Journal of …, 2019 - ieeexplore.ieee.org
Many modern and emerging applications must process increasingly large volumes of data.
Unfortunately, prevalent computing paradigms are not designed to efficiently handle such …

DAMOV: A new methodology and benchmark suite for evaluating data movement bottlenecks

GF Oliveira, J Gómez-Luna, L Orosa, S Ghose… - IEEE …, 2021 - ieeexplore.ieee.org
Data movement between the CPU and main memory is a first-order obstacle against improv
ing performance, scalability, and energy efficiency in modern systems. Computer systems …

Transparent offloading and mapping (TOM) enabling programmer-transparent near-data processing in GPU systems

K Hsieh, E Ebrahimi, G Kim, N Chatterjee… - ACM SIGARCH …, 2016 - dl.acm.org
Main memory bandwidth is a critical bottleneck for modern GPU systems due to limited off-
chip pin bandwidth. 3D-stacked memory architectures provide a promising opportunity to …

Accelerating pointer chasing in 3D-stacked memory: Challenges, mechanisms, evaluation

K Hsieh, S Khan, N Vijaykumar… - 2016 IEEE 34th …, 2016 - ieeexplore.ieee.org
Pointer chasing is a fundamental operation, used by many important data-intensive
applications (eg, databases, key-value stores, graph processing workloads) to traverse …

Low-cost inter-linked subarrays (LISA): Enabling fast inter-subarray data movement in DRAM

KK Chang, PJ Nair, D Lee, S Ghose… - … Symposium on High …, 2016 - ieeexplore.ieee.org
This paper introduces a new DRAM design that enables fast and energy-efficient bulk data
movement across subarrays in a DRAM chip. While bulk data movement is a key operation …

GRIM-Filter: Fast seed location filtering in DNA read mapping using processing-in-memory technologies

JS Kim, D Senol Cali, H Xin, D Lee, S Ghose, M Alser… - BMC genomics, 2018 - Springer
Background Seed location filtering is critical in DNA read mapping, a process where billions
of DNA fragments (reads) sampled from a donor are mapped onto a reference genome to …

Benchmarking a new paradigm: An experimental analysis of a real processing-in-memory architecture

J Gómez-Luna, IE Hajj, I Fernandez… - arXiv preprint arXiv …, 2021 - arxiv.org
Many modern workloads, such as neural networks, databases, and graph processing, are
fundamentally memory-bound. For such workloads, the data movement between main …