Gradientflow: Optimizing network performance for large-scale distributed dnn training

P Sun, Y Wen, R Han, W Feng… - IEEE Transactions on Big …, 2019 - ieeexplore.ieee.org
It is important to scale out deep neural network (DNN) training for reducing model training
time. The high communication overhead is one of the major performance bottlenecks for …

Optimizing network performance for distributed dnn training on gpu clusters: Imagenet/alexnet training in 1.5 minutes

P Sun, W Feng, R Han, S Yan, Y Wen - arXiv preprint arXiv:1902.06855, 2019 - arxiv.org
It is important to scale out deep neural network (DNN) training for reducing model training
time. The high communication overhead is one of the major performance bottlenecks for …

dpro: A generic performance diagnosis and optimization toolkit for expediting distributed dnn training

H Hu, C Jiang, Y Zhong, Y Peng, C Wu… - Proceedings of …, 2022 - proceedings.mlsys.org
Distributed training using multiple devices (eg, GPUs) has been widely adopted for learning
DNN models over large datasets. However, the performance of large-scale distributed …

Blueconnect: Decomposing all-reduce for deep learning on heterogeneous network hierarchy

M Cho, U Finkler, D Kung… - Proceedings of Machine …, 2019 - proceedings.mlsys.org
As deep neural networks get more complex and input datasets get larger, it can take days or
even weeks to train a deep neural network to the desired accuracy. Therefore, enabling …

Parameter box: High performance parameter servers for efficient distributed deep neural network training

L Luo, J Nelson, L Ceze, A Phanishayee… - arXiv preprint arXiv …, 2018 - arxiv.org
Most work in the deep learning systems community has focused on faster inference, but
arriving at a trained model requires lengthy experiments. Accelerating training lets …

Domain-specific communication optimization for distributed DNN training

H Wang, J Chen, X Wan, H Tian, J Xia, G Zeng… - arXiv preprint arXiv …, 2020 - arxiv.org
Communication overhead poses an important obstacle to distributed DNN training and
draws increasing attention in recent years. Despite continuous efforts, prior solutions such …

Prophet: Speeding up distributed dnn training with predictable communication scheduling

Z Zhang, Q Qi, R Shang, L Chen, F Xu - Proceedings of the 50th …, 2021 - dl.acm.org
Optimizing performance for Distributed Deep Neural Network (DDNN) training has recently
become increasingly compelling, as the DNN model gets complex and the training dataset …

Eflops: Algorithm and system co-design for a high performance distributed training platform

J Dong, Z Cao, T Zhang, J Ye, S Wang… - … Symposium on High …, 2020 - ieeexplore.ieee.org
Deep neural networks (DNNs) have gained tremendous attractions as compelling solutions
for applications such as image classification, object detection, speech recognition, and so …

Nv-group: link-efficient reduction for distributed deep learning on modern dense gpu systems

CH Chu, P Kousha, AA Awan, KS Khorassani… - Proceedings of the 34th …, 2020 - dl.acm.org
The advanced fabrics like NVIDIA NVLink are enabling the deployment of dense Graphics
Processing Unit (GPU) systems such as DGX-2 and Summit. With the wide adoption of large …

Powerai ddl

M Cho, U Finkler, S Kumar, D Kung, V Saxena… - arXiv preprint arXiv …, 2017 - arxiv.org
As deep neural networks become more complex and input datasets grow larger, it can take
days or even weeks to train a deep neural network to the desired accuracy. Therefore …