CC Chiu, X Zhang, T He, S Wang… - IEEE Journal on …, 2023 - ieeexplore.ieee.org
We consider the problem of training a given machine learning model by decentralized parallel stochastic gradient descent over training data distributed across multiple nodes …
When the data is distributed across multiple servers, lowering the communication cost between the servers (or workers) while solving the distributed learning problem is an …
Y Yu, J Wu, L Huang - Advances in Neural Information …, 2019 - proceedings.neurips.cc
Modern distributed training of machine learning models often suffers from high communication overhead for synchronizing stochastic gradients and model parameters. In …
In the last few years, distributed machine learning has been usually executed over heterogeneous networks such as a local area network within a multi-tenant cluster or a wide …
P Jiang, G Agrawal - Advances in Neural Information …, 2018 - proceedings.neurips.cc
The large communication overhead has imposed a bottleneck on the performance of distributed Stochastic Gradient Descent (SGD) for training deep neural networks. Previous …
The present paper develops a novel aggregated gradient approach for distributed machine learning that adaptively compresses the gradient communication. The key idea is to first …
X Zhang, Y Wang, S Chen, C Wang, D Yu… - Journal of Systems …, 2023 - Elsevier
In this paper, we propose a robust communication-efficient decentralized learning algorithm, named RCEDL, to address data heterogeneity, communication heterogeneity and …
As the size and complexity of models and datasets grow, so does the need for communicatione fficient variants of stochastic gradient descent that can be deployed to …
Powerful computer clusters are used nowadays to train complex deep neural networks (DNN) on large datasets. Distributed training increasingly becomes communication bound …