Deep neural network approximation for custom hardware: Where we've been, where we're going

E Wang, JJ Davis, R Zhao, HC Ng, X Niu… - ACM Computing …, 2019 - dl.acm.org
Deep neural networks have proven to be particularly effective in visual and audio
recognition tasks. Existing models tend to be computationally expensive and memory …

Toolflows for mapping convolutional neural networks on FPGAs: A survey and future directions

SI Venieris, A Kouris, CS Bouganis - ACM Computing Surveys (CSUR), 2018 - dl.acm.org
In the past decade, Convolutional Neural Networks (CNNs) have demonstrated state-of-the-
art performance in various Artificial Intelligence tasks. To accelerate the experimentation and …

Machine learning at facebook: Understanding inference at the edge

CJ Wu, D Brooks, K Chen, D Chen… - … symposium on high …, 2019 - ieeexplore.ieee.org
At Facebook, machine learning provides a wide range of capabilities that drive many
aspects of user experience including ranking posts, content understanding, object detection …

A configurable cloud-scale DNN processor for real-time AI

J Fowers, K Ovtcharov, M Papamichael… - 2018 ACM/IEEE 45th …, 2018 - ieeexplore.ieee.org
Interactive AI-powered services require low-latency evaluation of deep neural network
(DNN) models-aka"" real-time AI"". The growing demand for computationally expensive …

PUMA: A programmable ultra-efficient memristor-based accelerator for machine learning inference

A Ankit, IE Hajj, SR Chalamalasetti, G Ndu… - Proceedings of the …, 2019 - dl.acm.org
Memristor crossbars are circuits capable of performing analog matrix-vector multiplications,
overcoming the fundamental energy efficiency limitations of digital logic. They have been …

[图书][B] Efficient processing of deep neural networks

V Sze, YH Chen, TJ Yang, JS Emer - 2020 - Springer
This book provides a structured treatment of the key principles and techniques for enabling
efficient processing of deep neural networks (DNNs). DNNs are currently widely used for …

SparTen: A sparse tensor accelerator for convolutional neural networks

A Gondimalla, N Chesnut, M Thottethodi… - Proceedings of the …, 2019 - dl.acm.org
Convolutional neural networks (CNNs) are emerging as powerful tools for image
processing. Recent machine learning work has reduced CNNs' compute and data volumes …

Floatpim: In-memory acceleration of deep neural network training with high precision

M Imani, S Gupta, Y Kim, T Rosing - Proceedings of the 46th International …, 2019 - dl.acm.org
Processing In-Memory (PIM) has shown a great potential to accelerate inference tasks of
Convolutional Neural Network (CNN). However, existing PIM architectures do not support …

Deeprecsys: A system for optimizing end-to-end at-scale neural recommendation inference

U Gupta, S Hsia, V Saraph, X Wang… - 2020 ACM/IEEE 47th …, 2020 - ieeexplore.ieee.org
Neural personalized recommendation is the cornerstone of a wide collection of cloud
services and products, constituting significant compute demand of cloud infrastructure. Thus …

Tensordimm: A practical near-memory processing architecture for embeddings and tensor operations in deep learning

Y Kwon, Y Lee, M Rhu - Proceedings of the 52nd Annual IEEE/ACM …, 2019 - dl.acm.org
Recent studies from several hyperscalars pinpoint to embedding layers as the most memory-
intensive deep learning (DL) algorithm being deployed in today's datacenters. This paper …