Machine learning at facebook: Understanding inference at the edge

CJ Wu, D Brooks, K Chen, D Chen… - … symposium on high …, 2019 - ieeexplore.ieee.org
At Facebook, machine learning provides a wide range of capabilities that drive many
aspects of user experience including ranking posts, content understanding, object detection …

[图书][B] Efficient processing of deep neural networks

V Sze, YH Chen, TJ Yang, JS Emer - 2020 - Springer
This book provides a structured treatment of the key principles and techniques for enabling
efficient processing of deep neural networks (DNNs). DNNs are currently widely used for …

Scale-sim: Systolic cnn accelerator simulator

A Samajdar, Y Zhu, P Whatmough, M Mattina… - arXiv preprint arXiv …, 2018 - arxiv.org
Systolic Arrays are one of the most popular compute substrates within Deep Learning
accelerators today, as they provide extremely high efficiency for running dense matrix …

A systematic methodology for characterizing scalability of dnn accelerators using scale-sim

A Samajdar, JM Joseph, Y Zhu… - … Analysis of Systems …, 2020 - ieeexplore.ieee.org
The compute demand for deep learning workloads is well known and is a prime motivator for
powerful parallel computing platforms such as GPUs or dedicated hardware accelerators …

S2ta: Exploiting structured sparsity for energy-efficient mobile cnn acceleration

ZG Liu, PN Whatmough, Y Zhu… - 2022 IEEE International …, 2022 - ieeexplore.ieee.org
Exploiting sparsity is a key technique in accelerating quantized convolutional neural network
(CNN) inference on mobile devices. Prior sparse CNN accelerators largely exploit …

Hardware acceleration of sparse and irregular tensor computations of ml models: A survey and insights

S Dave, R Baghdadi, T Nowatzki… - Proceedings of the …, 2021 - ieeexplore.ieee.org
Machine learning (ML) models are widely used in many important domains. For efficiently
processing these computational-and memory-intensive applications, tensors of these …

Building the computing system for autonomous micromobility vehicles: Design constraints and architectural optimizations

B Yu, W Hu, L Xu, J Tang, S Liu… - 2020 53rd Annual IEEE …, 2020 - ieeexplore.ieee.org
This paper presents the computing system design in our commercial autonomous vehicles,
and provides a detailed performance, energy, and cost analyses. Drawing from our …

High-throughput cnn inference on embedded arm big. little multicore processors

S Wang, G Ananthanarayanan, Y Zeng… - … on Computer-Aided …, 2019 - ieeexplore.ieee.org
Internet of Things edge intelligence requires convolutional neural network (CNN) inference
to take place in the edge devices itself. ARM big. LITTLE architecture is at the heart of …

Think fast: A tensor streaming processor (TSP) for accelerating deep learning workloads

D Abts, J Ross, J Sparling… - 2020 ACM/IEEE 47th …, 2020 - ieeexplore.ieee.org
In this paper, we introduce the Tensor Streaming Processor (TSP) architecture, a functionally-
sliced microarchitecture with memory units interleaved with vector and matrix deep learning …

An overview of sparsity exploitation in CNNs for on-device intelligence with software-hardware cross-layer optimizations

S Kang, G Park, S Kim, S Kim, D Han… - IEEE Journal on …, 2021 - ieeexplore.ieee.org
This paper presents a detailed overview of sparsity exploitation in deep neural network
(DNN) accelerators. Despite the algorithmic advancements which drove DNNs to become …