Eflops: Algorithm and system co-design for a high performance distributed training platform

J Dong, Z Cao, T Zhang, J Ye, S Wang… - … Symposium on High …, 2020 - ieeexplore.ieee.org
Deep neural networks (DNNs) have gained tremendous attractions as compelling solutions
for applications such as image classification, object detection, speech recognition, and so …

An in-depth analysis of distributed training of deep neural networks

Y Ko, K Choi, J Seo, SW Kim - 2021 IEEE International Parallel …, 2021 - ieeexplore.ieee.org
As the popularity of deep learning in industry rapidly grows, efficient training of deep neural
networks (DNNs) becomes important. To train a DNN with a large amount of data, distributed …

HPDL: towards a general framework for high-performance distributed deep learning

D Li, Z Lai, K Ge, Y Zhang, Z Zhang… - 2019 IEEE 39th …, 2019 - ieeexplore.ieee.org
With growing scale of the data volume and neural network size, we have come into the era
of distributed deep learning. High-performance training and inference on distributed …

A network-centric hardware/algorithm co-design to accelerate distributed training of deep neural networks

Y Li, J Park, M Alian, Y Yuan, Z Qu… - 2018 51st Annual …, 2018 - ieeexplore.ieee.org
Training real-world Deep Neural Networks (DNNs) can take an eon (ie, weeks or months)
without leveraging distributed systems. Even distributed training takes inordinate time, of …

Gradientflow: Optimizing network performance for large-scale distributed dnn training

P Sun, Y Wen, R Han, W Feng… - IEEE Transactions on Big …, 2019 - ieeexplore.ieee.org
It is important to scale out deep neural network (DNN) training for reducing model training
time. The high communication overhead is one of the major performance bottlenecks for …

Optimizing network performance for distributed dnn training on gpu clusters: Imagenet/alexnet training in 1.5 minutes

P Sun, W Feng, R Han, S Yan, Y Wen - arXiv preprint arXiv:1902.06855, 2019 - arxiv.org
It is important to scale out deep neural network (DNN) training for reducing model training
time. The high communication overhead is one of the major performance bottlenecks for …

Accelerated training for cnn distributed deep learning through automatic resource-aware layer placement

JH Park, S Kim, J Lee, M Jeon, SH Noh - arXiv preprint arXiv:1901.05803, 2019 - arxiv.org
The Convolutional Neural Network (CNN) model, often used for image classification,
requires significant training time to obtain high accuracy. To this end, distributed training is …

Enabling compute-communication overlap in distributed deep learning training platforms

S Rashidi, M Denton, S Sridharan… - 2021 ACM/IEEE 48th …, 2021 - ieeexplore.ieee.org
Deep Learning (DL) training platforms are built by interconnecting multiple DL accelerators
(eg, GPU/TPU) via fast, customized interconnects with 100s of gigabytes (GBs) of bandwidth …

Cynthia: Cost-efficient cloud resource provisioning for predictable distributed deep neural network training

H Zheng, F Xu, L Chen, Z Zhou, F Liu - Proceedings of the 48th …, 2019 - dl.acm.org
It becomes an increasingly popular trend for deep neural networks with large-scale datasets
to be trained in a distributed manner in the cloud. However, widely known as resource …

Parallel and distributed training of deep neural networks: A brief overview

A Farkas, G Kertész, R Lovas - 2020 IEEE 24th International …, 2020 - ieeexplore.ieee.org
Deep neural networks and deep learning are becoming important and popular techniques in
modern services and applications. The training of these networks is computationally …