Tetris: Scalable and efficient neural network acceleration with 3d memory

M Gao, J Pu, X Yang, M Horowitz… - Proceedings of the Twenty …, 2017 - dl.acm.org
The high accuracy of deep neural networks (NNs) has led to the development of NN
accelerators that improve performance by two orders of magnitude. However, scaling these …

Superneurons: Dynamic GPU memory management for training deep neural networks

L Wang, J Ye, Y Zhao, W Wu, A Li, SL Song… - Proceedings of the 23rd …, 2018 - dl.acm.org
Going deeper and wider in neural architectures improves their accuracy, while the limited
GPU DRAM places an undesired restriction on the network design domain. Deep Learning …

Cnvlutin: Ineffectual-neuron-free deep neural network computing

J Albericio, P Judd, T Hetherington, T Aamodt… - ACM SIGARCH …, 2016 - dl.acm.org
This work observes that a large fraction of the computations performed by Deep Neural
Networks (DNNs) are intrinsically ineffectual as they involve a multiplication where one of …

Tangram: Optimized coarse-grained dataflow for scalable nn accelerators

M Gao, X Yang, J Pu, M Horowitz… - Proceedings of the Twenty …, 2019 - dl.acm.org
The use of increasingly larger and more complex neural networks (NNs) makes it critical to
scale the capabilities and efficiency of NN accelerators. Tiled architectures provide an …

Nnpim: A processing in-memory architecture for neural network acceleration

S Gupta, M Imani, H Kaur… - IEEE Transactions on …, 2019 - ieeexplore.ieee.org
Neural networks (NNs) have shown great ability to process emerging applications such as
speech recognition, language recognition, image classification, video segmentation, and …

Proteus: Exploiting numerical precision variability in deep neural networks

P Judd, J Albericio, T Hetherington… - Proceedings of the …, 2016 - dl.acm.org
This work exploits the tolerance of Deep Neural Networks (DNNs) to reduced precision
numerical representations and specifically, their recently demonstrated ability to tolerate …

SMAUG: End-to-end full-stack simulation infrastructure for deep learning workloads

S Xi, Y Yao, K Bhardwaj, P Whatmough… - ACM Transactions on …, 2020 - dl.acm.org
In recent years, there has been tremendous advances in hardware acceleration of deep
neural networks. However, most of the research has focused on optimizing accelerator …

Bit-pragmatic deep neural network computing

J Albericio, A Delmás, P Judd, S Sharify… - Proceedings of the 50th …, 2017 - dl.acm.org
Deep Neural Networks expose a high degree of parallelism, making them amenable to
highly data parallel architectures. However, data-parallel architectures often accept …

C-Brain: A deep learning accelerator that tames the diversity of CNNs through adaptive data-level parallelization

L Song, Y Wang, Y Han, X Zhao, B Liu… - Proceedings of the 53rd …, 2016 - dl.acm.org
Convolutional neural networks (CNN) accelerators have been proposed as an efficient
hardware solution for deep learning based applications, which are known to be both …

Cambricon: An instruction set architecture for neural networks

S Liu, Z Du, J Tao, D Han, T Luo, Y Xie… - ACM SIGARCH …, 2016 - dl.acm.org
Neural Networks (NN) are a family of models for a broad range of emerging machine
learning and pattern recondition applications. NN techniques are conventionally executed …