[PDF][PDF] At the Locus of Performance: Quantifying the Effects of Copious 3D-Stacked Cache on HPC Workloads

J Domke, E Vatai, B Gerofi, Y Kodama… - arXiv preprint arXiv …, 2022 - researchgate.net
Over the last three decades, innovations in the memory subsystem were primarily targeted at
overcoming the data movement bottleneck. In this paper, we focus on a specific market trend …

TransPimLib: A Library for Efficient Transcendental Functions on Processing-in-Memory Systems

J Gómez-Luna, Y Guo, GF Oliveira… - arXiv preprint arXiv …, 2023 - arxiv.org
Processing-in-memory (PIM) promises to alleviate the data movement bottleneck in modern
computing systems. However, current real-world PIM systems have the inherent …

Heterogeneous Data-Centric Architectures for Modern Data-Intensive Applications: Case Studies in Machine Learning and Databases

GF Oliveira, A Boroumand, S Ghose… - 2022 IEEE Computer …, 2022 - ieeexplore.ieee.org
Today's computing systems require moving data back-and-forth between computing
resources (eg, CPUs, GPUs, accelerators) and off-chip main memory so that computation …

Ndp-rank: Prediction and ranking of ndp systems performance using machine learning

V Iskandar, MA Abd El Ghany, D Goehringer - Microprocessors and …, 2023 - Elsevier
The near-data processing (NDP) paradigm has recently gained popularity as a promising
method for mitigating the memory wall challenges of future computing systems. Modern 3D …

Analysis of Conventional, Near-Memory, and In-Memory DNN Accelerators

T Glint, CK Jha, M Awasthi… - 2023 IEEE International …, 2023 - ieeexplore.ieee.org
Various DNN accelerators based on Conventional compute Hardware Accelerator (CHA),
Near-Data-Processing (NDP) and Processing-in-Memory (PIM) paradigms have been …

Accelerating Irregular Applications via Efficient Synchronization and Data Access Techniques

C Giannoula - arXiv preprint arXiv:2211.05908, 2022 - arxiv.org
Irregular applications comprise an increasingly important workload domain for many fields,
including bioinformatics, chemistry, physics, social sciences and machine learning …

Methodologies, Workloads, and Tools for Processing-in-Memory: Enabling the Adoption of Data-Centric Architectures

GF Oliveira, J Gómez-Luna, S Ghose… - 2022 IEEE Computer …, 2022 - ieeexplore.ieee.org
The increasing prevalence and growing size of data in modern applications have led to high
costs for computation in tra-ditional processor-centric computing systems. Moving large …

PUMA: Efficient and Low-Cost Memory Allocation and Alignment Support for Processing-Using-Memory Architectures

GF Oliveira, EG Esposito, J Gómez-Luna… - arXiv preprint arXiv …, 2024 - arxiv.org
Processing-using-DRAM (PUD) architectures impose a restrictive data layout and alignment
for their operands, where source and destination operands (i) must reside in the same …

Improving DRAM Performance, Reliability, and Security by Rigorously Understanding Intrinsic DRAM Operation

H Hassan - arXiv preprint arXiv:2303.07445, 2023 - arxiv.org
DRAM is the primary technology used for main memory in modern systems. Unfortunately,
as DRAM scales down to smaller technology nodes, it faces key challenges in both data …

TCAM-SSD: A Framework for Search-Based Computing in Solid-State Drives

R Wong, N Kim, K Higgs, S Agarwal, E Ipek… - arXiv preprint arXiv …, 2024 - arxiv.org
As the amount of data produced in society continues to grow at an exponential rate, modern
applications are incurring significant performance and energy penalties due to high data …