Survey of CPU and memory simulators in computer architecture: A comprehensive analysis including compiler integration and emerging technology applications

I Hwang, J Lee, H Kang, G Lee, H Kim - Simulation Modelling Practice and …, 2024 - Elsevier
In computer architecture studies, simulators are crucial for design verification, reducing
research and development time and ensuring the high accuracy of verification results …

PIM-trie: A Skew-resistant Trie for Processing-in-Memory

H Kang, Y Zhao, GE Blelloch, L Dhulipala… - Proceedings of the 35th …, 2023 - dl.acm.org
Memory latency and bandwidth are significant bottlenecks in designing in-memory indexes.
Processing-in-memory (PIM), an emerging hardware design approach, alleviates this …

TEFLON: Thermally Efficient Dataflow-Aware 3D NoC for Accelerating CNN Inferencing on Manycore PIM Architectures

G Narang, C Ogbogu, JR Doppa… - ACM Transactions on …, 2024 - dl.acm.org
Resistive random-access memory (ReRAM)-based processing-in-memory (PIM)
architectures are used extensively to accelerate inferencing/training with convolutional …

Training Neural Networks With In-Memory-Computing Hardware and Multi-Level Radix-4 Inputs

C Grimm, J Lee, N Verma - … on Circuits and Systems I: Regular …, 2024 - ieeexplore.ieee.org
Training Deep Neural Networks (DNNs) requires a large number of operations, among
which matrix-vector multiplies (MVMs), often of high dimensionality, dominate. In-Memory …

A 28-nm 8-bit Floating-Point Tensor Core-Based Programmable CNN Training Processor With Dynamic Structured Sparsity

SK Venkataramanaiah, J Meng, HS Suh… - IEEE Journal of Solid …, 2023 - ieeexplore.ieee.org
Training deep/convolutional neural networks (DNNs/CNNs) requires a large amount of
memory and iterative computation, which necessitates speedup and energy reduction …

RedMule: A mixed-precision matrix–matrix operation engine for flexible and energy-efficient on-chip linear algebra and TinyML training acceleration

Y Tortorella, L Bertaccini, L Benini, D Rossi… - Future Generation …, 2023 - Elsevier
The increasing interest in TinyML, ie, near-sensor machine learning on power budgets of a
few tens of mW, is currently pushing toward enabling TinyML-class training as opposed to …

[PDF][PDF] In-depth survey of processing-in-memory architectures for deep neural networks

JH Jang, J Shin, JT Park, IS Hwang… - JOURNAL OF …, 2023 - journal.auric.kr
Processing-in-Memory (PIM) is an emerging computing architecture that has gained
significant attention in recent times. It aims to maximize data movement efficiency by moving …

SP-PIM: A Super-Pipelined Processing-In-Memory Accelerator With Local Error Prediction for Area/Energy-Efficient On-Device Learning

J Heo, JH Kim, W Han, J Kim… - IEEE Journal of Solid …, 2024 - ieeexplore.ieee.org
Over the past few years, on-device learning (ODL) has become an integral aspect of the
success of edge devices that embrace machine learning (ML) since it plays a crucial role in …

All-Digital Computing-in-Memory Macro Supporting FP64-Based Fused Multiply-Add Operation

D Li, K Mo, L Liu, B Pan, W Li, W Kang, L Li - Applied Sciences, 2023 - mdpi.com
Recently, frequent data movement between computing units and memory during floating-
point arithmetic has become a major problem for scientific computing. Computing-in-memory …

EPU: An Energy-Efficient Explainable AI Accelerator With Sparsity-Free Computation and Heat Map Compression/Pruning

J Kim, S Han, G Ko, JH Kim, C Lee… - IEEE Journal of Solid …, 2024 - ieeexplore.ieee.org
Deep neural networks (DNNs) have recently gained significant prominence in various real-
world applications such as image recognition, natural language processing, and …