Deep learning acceleration with neuron-to-memory transformation

M Imani, MS Razlighi, Y Kim, S Gupta… - … Symposium on High …, 2020 - ieeexplore.ieee.org
Deep neural networks (DNN) have demonstrated effectiveness for various applications such
as image processing, video segmentation, and speech recognition. Running state-of-theart …

XNORBIN: A 95 TOp/s/W hardware accelerator for binary convolutional neural networks

A Al Bahou, G Karunaratne, R Andri… - … IEEE Symposium in …, 2018 - ieeexplore.ieee.org
Deploying state-of-the-art CNNs requires power-hungry processors and off-chip memory.
This precludes the implementation of CNNs in low-power embedded systems. Recent …

Quantized deep neural networks for energy efficient hardware-based inference

R Ding, Z Liu, RDS Blanton… - 2018 23rd Asia and …, 2018 - ieeexplore.ieee.org
Deep Neural Networks (DNNs) have been adopted in many systems because of their higher
classification accuracy, with custom hardware implementations great candidates for high …

Sparsetrain: Leveraging dynamic sparsity in software for training dnns on general-purpose simd processors

Z Gong, H Ji, CW Fletcher, CJ Hughes… - Proceedings of the ACM …, 2020 - dl.acm.org
Our community has improved the efficiency of deep learning applications by exploiting
sparsity in inputs. Most of that work, though, is for inference, where weight sparsity is known …

A mixed-precision RISC-V processor for extreme-edge DNN inference

G Ottavi, A Garofalo, G Tagliavini… - 2020 IEEE Computer …, 2020 - ieeexplore.ieee.org
Low bit-width Quantized Neural Networks (QNNs) enable deployment of complex machine
learning models on constrained devices such as microcontrollers (MCUs) by reducing their …

A 65nm 4Kb algorithm-dependent computing-in-memory SRAM unit-macro with 2.3 ns and 55.8 TOPS/W fully parallel product-sum operation for binary DNN edge …

WS Khwa, JJ Chen, JF Li, X Si, EY Yang… - … Solid-State Circuits …, 2018 - ieeexplore.ieee.org
For deep-neural-network (DNN) processors [1-4], the product-sum (PS) operation
predominates the computational workload for both convolution (CNVL) and fully-connect …

Rnsnet: In-memory neural network acceleration using residue number system

S Salamat, M Imani, S Gupta… - 2018 IEEE International …, 2018 - ieeexplore.ieee.org
We live in a world where technological advances are continually creating more data than
what we can deal with. Machine learning algorithms, in particular Deep Neural Networks …

Bitflow: Exploiting vector parallelism for binary neural networks on cpu

Y Hu, J Zhai, D Li, Y Gong, Y Zhu… - 2018 IEEE …, 2018 - ieeexplore.ieee.org
Deep learning has revolutionized computer vision and other fields since its big bang in
2012. However, it is challenging to deploy Deep Neural Networks (DNNs) into real-world …

Biqgemm: matrix multiplication with lookup table for binary-coding-based quantized dnns

Y Jeon, B Park, SJ Kwon, B Kim… - … Conference for High …, 2020 - ieeexplore.ieee.org
The number of parameters in deep neural networks (DNNs) is rapidly increasing to support
complicated tasks and to improve model accuracy. Correspondingly, the amount of …

Proteus: Exploiting numerical precision variability in deep neural networks

P Judd, J Albericio, T Hetherington… - Proceedings of the …, 2016 - dl.acm.org
This work exploits the tolerance of Deep Neural Networks (DNNs) to reduced precision
numerical representations and specifically, their recently demonstrated ability to tolerate …