Spvit: Enabling faster vision transformers via latency-aware soft token pruning

Z Kong, P Dong, X Ma, X Meng, W Niu, M Sun… - European conference on …, 2022 - Springer
Abstract Recently, Vision Transformer (ViT) has continuously established new milestones in
the computer vision field, while the high computation and memory cost makes its …

RAELLA: Reforming the arithmetic for efficient, low-resolution, and low-loss analog PIM: No retraining required!

T Andrulis, JS Emer, V Sze - … of the 50th Annual International Symposium …, 2023 - dl.acm.org
Processing-In-Memory (PIM) accelerators have the potential to efficiently run Deep Neural
Network (DNN) inference by reducing costly data movement and by using resistive RAM …

An overview of sparsity exploitation in CNNs for on-device intelligence with software-hardware cross-layer optimizations

S Kang, G Park, S Kim, S Kim, D Han… - IEEE Journal on …, 2021 - ieeexplore.ieee.org
This paper presents a detailed overview of sparsity exploitation in deep neural network
(DNN) accelerators. Despite the algorithmic advancements which drove DNNs to become …

STICKER-IM: A 65 nm computing-in-memory NN processor using block-wise sparsity optimization and inter/intra-macro data reuse

J Yue, Y Liu, Z Yuan, X Feng, Y He… - IEEE Journal of Solid …, 2022 - ieeexplore.ieee.org
Computing-in-memory (CIM) is a promising architecture for energy-efficient neural network
(NN) processors. Several CIM macros have demonstrated high energy efficiency, while CIM …

Structured pruning of RRAM crossbars for efficient in-memory computing acceleration of deep neural networks

J Meng, L Yang, X Peng, S Yu, D Fan… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org
The high computational complexity and a large number of parameters of deep neural
networks (DNNs) become the most intensive burden of deep learning hardware design …

AUTO-PRUNE: Automated DNN pruning and mapping for ReRAM-based accelerator

S Yang, W Chen, X Zhang, S He, Y Yin… - Proceedings of the ACM …, 2021 - dl.acm.org
Emergent ReRAM-based accelerators support in-memory computation to accelerate deep
neural network (DNN) inference. Weight matrix pruning of DNNs is a widely used technique …

Exploring compute-in-memory architecture granularity for structured pruning of neural networks

FH Meng, X Wang, Z Wang, EYJ Lee… - IEEE Journal on …, 2022 - ieeexplore.ieee.org
Compute-in-Memory (CIM) implemented with Resistive-Random-Access-Memory (RRAM)
crossbars is a promising approach for Deep Neural Network (DNN) acceleration. As the …

On-fiber photonic computing

M Yang, Z Zhong, M Ghobadi - Proceedings of the 22nd ACM Workshop …, 2023 - dl.acm.org
In the 1800s, Charles Babbage envisioned computers as analog devices. However, it was
not until 150 years later that a Mechanical Analog Computer was constructed for the US …

Bit-transformer: Transforming bit-level sparsity into higher preformance in reram-based accelerator

F Liu, W Zhao, Z He, Z Wang, Y Zhao… - 2021 IEEE/ACM …, 2021 - ieeexplore.ieee.org
Resistive Random-Access-Memory (ReRAM) crossbar is one of the most promising neural
network accelerators, thanks to its in-memory and in-situ analog computing abilities for …

Designing efficient bit-level sparsity-tolerant memristive networks

B Lyu, S Wen, Y Yang, X Chang, J Sun… - … on Neural Networks …, 2023 - ieeexplore.ieee.org
With the rapid progress of deep neural network (DNN) applications on memristive platforms,
there has been a growing interest in the acceleration and compression of memristive …