Scheduling techniques for GPU architectures with processing-in-memory capabilities

A Pattnaik, X Tang, A Jog, O Kayiran… - Proceedings of the …, 2016 - dl.acm.org
Processing data in or near memory (PIM), as opposed to in conventional computational units
in a processor, can greatly alleviate the performance and energy penalties of data transfers …

[PDF][PDF] Volatile and nonvolatile memory devices for neuromorphic and processing-in-memory applications

S Cho - J Semicond Technol Sci, 2022 - journal.auric.kr
The motivation for driving semiconductor devices can be found in the development of
advanced computers which can contribute to the betterment in our daily lives. The …

In-memory data parallel processor

D Fujiki, S Mahlke, R Das - ACM SIGPLAN Notices, 2018 - dl.acm.org
Recent developments in Non-Volatile Memories (NVMs) have opened up a new horizon for
in-memory computing. Despite the significant performance gain offered by computational …

Understanding latency variation in modern DRAM chips: Experimental characterization, analysis, and optimization

KK Chang, A Kashyap, H Hassan, S Ghose… - Proceedings of the …, 2016 - dl.acm.org
Long DRAM latency is a critical performance bottleneck in current systems. DRAM access
latency is defined by three fundamental operations that take place within the DRAM cell …

The RowHammer problem and other issues we may face as memory becomes denser

O Mutlu - Design, Automation & Test in Europe Conference & …, 2017 - ieeexplore.ieee.org
As memory scales down to smaller technology nodes, new failure mechanisms emerge that
threaten its correct operation. If such failure mechanisms are not anticipated and corrected …

Figaro: Improving system performance via fine-grained in-dram data relocation and caching

Y Wang, L Orosa, X Peng, Y Guo… - 2020 53rd Annual …, 2020 - ieeexplore.ieee.org
Main memory, composed of DRAM, is a performance bottleneck for many applications, due
to the high DRAM access latency. In-DRAM caches work to mitigate this latency by …

LazyPIM: An efficient cache coherence mechanism for processing-in-memory

A Boroumand, S Ghose, M Patel… - IEEE Computer …, 2016 - ieeexplore.ieee.org
Processing-in-memory (PIM) architectures cannot use traditional approaches to cache
coherence due to the high off-chip traffic consumed by coherence messages. We propose …

QUAC-TRNG: High-throughput true random number generation using quadruple row activation in commodity DRAM chips

A Olgun, M Patel, AG Yağlıkçı, H Luo… - 2021 ACM/IEEE 48th …, 2021 - ieeexplore.ieee.org
True random number generators (TRNG) sample random physical processes to create large
amounts of random numbers for various use cases, including security-critical cryptographic …

NATSA: a near-data processing accelerator for time series analysis

I Fernandez, R Quislant, E Gutiérrez… - 2020 IEEE 38th …, 2020 - ieeexplore.ieee.org
Time series analysis is a key technique for extracting and predicting events in domains as
diverse as epidemiology, genomics, neuroscience, environmental sciences, economics, and …

Benchmarking memory-centric computing systems: Analysis of real processing-in-memory hardware

J Gómez-Luna, I El Hajj, I Fernandez… - 2021 12th …, 2021 - ieeexplore.ieee.org
Many modern workloads such as neural network inference and graph processing are
fundamentally memory-bound. For such workloads, data movement between memory and …