An algorithm–hardware co-optimized framework for accelerating n: M sparse transformers

C Fang, A Zhou, Z Wang - IEEE Transactions on Very Large …, 2022 - ieeexplore.ieee.org
The Transformer has been an indispensable staple in deep learning. However, for real-life
applications, it is very challenging to deploy efficient Transformers due to the immense …

In-memory associative processors: Tutorial, potential, and challenges

ME Fouda, HE Yantır, AM Eltawil… - IEEE transactions on …, 2022 - ieeexplore.ieee.org
In-memory computing is an emerging computing paradigm that overcomes the limitations of
exiting Von-Neumann computing architectures such as the memory-wall bottleneck. In such …

AM4: MRAM Crossbar Based CAM/TCAM/ACAM/AP for In-Memory Computing

E Garzón, M Lanuzza, A Teman… - IEEE Journal on …, 2023 - ieeexplore.ieee.org
In-memory computing seeks to minimize data movement and alleviate the memory wall by
computing in-situ, in the same place that the data is located. One of the key emerging …

Neural architecture search for in-memory computing-based deep learning accelerators

O Krestinskaya, ME Fouda, H Benmeziane… - Nature Reviews …, 2024 - nature.com
The rapid growth of artificial intelligence and the increasing complexity of neural network
models are driving demand for efficient hardware architectures that can address power …

A hardware/software co-design methodology for in-memory processors

HE Yantır, AM Eltawil, KN Salama - Journal of Parallel and Distributed …, 2022 - Elsevier
The bottleneck between the processor and memory is the most significant barrier to the
ongoing development of efficient processing systems. Therefore, a research effort begun to …

BF-IMNA: A Bit Fluid In-Memory Neural Architecture for Neural Network Acceleration

M Rakka, R Karami, AM Eltawil, ME Fouda… - arXiv preprint arXiv …, 2024 - arxiv.org
Mixed-precision quantization works Neural Networks (NNs) are gaining traction for their
efficient realization on the hardware leading to higher throughput and lower energy. In …

[HTML][HTML] A Novel Multi-Mode Charge Pump in Word Line Driver for Compute-in-Memory Arrays

Z Lin, X Zhong, Z Yu, Y Dong, Z Huang, X Gu - Electronics, 2025 - mdpi.com
Flash memory, as the core unit of a compute-in-memory (CIM) array, requires multiple
positive and negative (PN) high voltages (HVs) for word lines (WLs) to operate during …

eF2lowSim: System-Level Simulator of eFlash-Based Compute-in-Memory Accelerators for Convolutional Neural Networks

J Wang, S Kim, J Heo, CS Park - 2023 Design, Automation & …, 2023 - ieeexplore.ieee.org
A new system-level simulator, eF 2 lowSim, is proposed to estimate the bit-accurate and
cycle-accurate performance of eFlash compute-in-memory (CIM) accelerators for …

A novel word line driver circuit for compute-in-memory based on the floating gate devices

X Gu, R Che, Y Dong, Z Yu - Electronics, 2023 - mdpi.com
In floating gate compute-in-memory (CIM) chips, due to the gate equivalent capacitance of
the large-scale array and the parasitic capacitance of the long-distance transmission wire, it …

FPonAP: Implementation of Floating Point Operations on Associative Processors

W Amer, M Rakka, F Kurdahi - IEEE Embedded Systems Letters, 2024 - ieeexplore.ieee.org
The associative processor (AP) is a processing in-memory (PIM) platform that avoids data
movement between the memory and the processor by running computations directly in the …