A survey of deep learning on cpus: opportunities and co-optimizations

S Mittal, P Rajput, S Subramoney - IEEE Transactions on …, 2021 - ieeexplore.ieee.org
CPU is a powerful, pervasive, and indispensable platform for running deep learning (DL)
workloads in systems ranging from mobile to extreme-end servers. In this article, we present …

Optimizing cnns on multicores for scalability, performance and goodput

S Rajbhandari, Y He, O Ruwase, M Carbin… - ACM SIGARCH …, 2017 - dl.acm.org
Convolutional Neural Networks (CNN) are a class of Ar-tificial Neural Networks (ANN) that
are highly efficient at the pattern recognition tasks that underlie difficult AI prob-lems in a …

REDUCT: Keep it close, keep it cool!: Efficient scaling of DNN inference on multi-core CPUs with near-cache compute

AV Nori, R Bera, S Balachandran… - 2021 ACM/IEEE 48th …, 2021 - ieeexplore.ieee.org
Deep Neural Networks (DNN) are used in a variety of applications and services. With the
evolving nature of DNNs, the race to build optimal hardware (both in datacenter and edge) …

Accelerating slide deep learning on modern cpus: Vectorization, quantizations, memory optimizations, and more

S Daghaghi, N Meisburger, M Zhao… - … of Machine Learning …, 2021 - proceedings.mlsys.org
Deep learning implementations on CPUs (Central Processing Units) are gaining more
traction. Enhanced AI capabilities on commodity x86 architectures are commercially …

Caffe con troll: Shallow ideas to speed up deep learning

S Hadjis, F Abuzaid, C Zhang, C Ré - … of the Fourth Workshop on Data …, 2015 - dl.acm.org
We present Caffe con Troll (CcT), a fully compatible end-to-end version of the popular
framework Caffe with rebuilt internals. We built CcT to examine the performance …

On scale-out deep learning training for cloud and hpc

S Sridharan, K Vaidyanathan, D Kalamkar… - arXiv preprint arXiv …, 2018 - arxiv.org
The exponential growth in use of large deep neural networks has accelerated the need for
training these deep neural networks in hours or even minutes. This can only be achieved …

A survey of techniques for optimizing deep learning on GPUs

S Mittal, S Vaishay - Journal of Systems Architecture, 2019 - Elsevier
The rise of deep-learning (DL) has been fuelled by the improvements in accelerators. Due to
its unique features, the GPU continues to remain the most widely used accelerator for DL …

IBM deep learning service

B Bhattacharjee, S Boag, C Doshi… - IBM Journal of …, 2017 - ieeexplore.ieee.org
Deep learning, driven by large neural network models, is overtaking traditional machine
learning methods for understanding unstructured and perceptual data domains such as …

Scaling support vector machines on modern HPC platforms

Y You, H Fu, SL Song, A Randles, D Kerbyson… - Journal of Parallel and …, 2015 - Elsevier
Abstract Support Vector Machines (SVM) have been widely used in data-mining and Big
Data applications as modern commercial databases start to attach an increasing importance …

Fast deep neural network training on distributed systems and cloud TPUs

Y You, Z Zhang, CJ Hsieh, J Demmel… - IEEE Transactions on …, 2019 - ieeexplore.ieee.org
Since its creation, the ImageNet-1k benchmark set has played a significant role as a
benchmark for ascertaining the accuracy of different deep neural net (DNN) models on the …