Deep learning inference in facebook data centers: Characterization, performance optimizations and hardware implications

J Park, M Naumov, P Basu, S Deng, A Kalaiah… - arXiv preprint arXiv …, 2018 - arxiv.org
The application of deep learning techniques resulted in remarkable improvement of
machine learning models. In this paper provides detailed characterizations of deep learning …

Deep learning training in facebook data centers: Design of scale-up and scale-out systems

M Naumov, J Kim, D Mudigere, S Sridharan… - arXiv preprint arXiv …, 2020 - arxiv.org
Large-scale training is important to ensure high performance and accuracy of machine-
learning models. At Facebook we use many different models, including computer vision …

Applied machine learning at facebook: A datacenter infrastructure perspective

K Hazelwood, S Bird, D Brooks… - … symposium on high …, 2018 - ieeexplore.ieee.org
Machine learning sits at the core of many essential products and services at Facebook. This
paper describes the hardware and software infrastructure that supports machine learning at …

Machine learning at facebook: Understanding inference at the edge

CJ Wu, D Brooks, K Chen, D Chen… - … symposium on high …, 2019 - ieeexplore.ieee.org
At Facebook, machine learning provides a wide range of capabilities that drive many
aspects of user experience including ranking posts, content understanding, object detection …

The architectural implications of facebook's dnn-based personalized recommendation

U Gupta, CJ Wu, X Wang, M Naumov… - … Symposium on High …, 2020 - ieeexplore.ieee.org
The widespread application of deep learning has changed the landscape of computation in
data centers. In particular, personalized recommendation for content ranking is now largely …

MimicNet: Fast performance estimates for data center networks with machine learning

Q Zhang, KKW Ng, C Kazer, S Yan, J Sedoc… - Proceedings of the 2021 …, 2021 - dl.acm.org
At-scale evaluation of new data center network innovations is becoming increasingly
intractable. This is true for testbeds, where few, if any, can afford a dedicated, full-scale …

Inside the social network's (datacenter) network

A Roy, H Zeng, J Bagga, G Porter… - Proceedings of the 2015 …, 2015 - dl.acm.org
Large cloud service providers have invested in increasingly larger datacenters to house the
computing infrastructure required to support their services. Accordingly, researchers and …

[PDF][PDF] Machine learning applications for data center optimization

J Gao, R Jamidar - Google White Paper, 2014 - research.google.com
The rapid adoption of Internetenabled devices, coupled with the shift from consumerside
computing to SaaS and cloudbased systems, is accelerating the growth of largescale data …

Accelerometer: Understanding acceleration opportunities for data center overheads at hyperscale

A Sriraman, A Dhanotia - Proceedings of the Twenty-Fifth International …, 2020 - dl.acm.org
At global user population scale, important microservices in warehouse-scale data centers
can grow to account for an enormous installed base of servers. With the end of Dennard …

Elastic parameter server load distribution in deep learning clusters

Y Chen, Y Peng, Y Bao, C Wu, Y Zhu… - Proceedings of the 11th …, 2020 - dl.acm.org
In distributed DNN training, parameter servers (PS) can become performance bottlenecks
due to PS stragglers, caused by imbalanced parameter distribution, bandwidth contention …