Distributed stochastic gradient descent (SGD) algorithms are widely deployed in training large-scale deep learning models, while the communication overhead among workers …
Communication overhead is one of the key challenges that hinders the scalability of distributed optimization algorithms to train large neural networks. In recent years, there has …
Distributed deep learning (DL) has become prevalent in recent years to reduce training time by leveraging multiple computing devices (eg, GPUs/TPUs) due to larger models and …
S Shi, Q Wang, X Chu, B Li, Y Qin… - IEEE INFOCOM 2020 …, 2020 - ieeexplore.ieee.org
Distributed synchronous stochastic gradient descent (SGD) algorithms are widely used in large-scale deep learning applications, while it is known that the communication bottleneck …
Y Yu, J Wu, L Huang - Advances in Neural Information …, 2019 - proceedings.neurips.cc
Modern distributed training of machine learning models often suffers from high communication overhead for synchronizing stochastic gradients and model parameters. In …
J Wu, W Huang, J Huang… - … Conference on Machine …, 2018 - proceedings.mlr.press
Large-scale distributed optimization is of great importance in various applications. For data- parallel based distributed learning, the inter-node gradient communication often becomes …
Communication bottleneck has been identified as a significant issue in distributed optimization of large-scale learning models. Recently, several approaches to mitigate this …
Powerful computer clusters are used nowadays to train complex deep neural networks (DNN) on large datasets. Distributed training workloads increasingly become …
S Li, T Hoefler - Proceedings of the 27th ACM SIGPLAN Symposium on …, 2022 - dl.acm.org
Communication overhead is one of the major obstacles to train large deep learning models at scale. Gradient sparsification is a promising technique to reduce the communication …