Communication-Efficient Distributed Stochastic Gradient Descent with Pooling Operator

Z Cai, A Chen, Y Luo, J Li - Available at SSRN 4327869, 2023 - papers.ssrn.com
Training deep neural networks on large datasets can be accelerated by distributing the
computations across multiple worker nodes. The distributed stochastic gradient descent (D …

A linear speedup analysis of distributed deep learning with sparse and quantized communication

P Jiang, G Agrawal - Advances in Neural Information …, 2018 - proceedings.neurips.cc
The large communication overhead has imposed a bottleneck on the performance of
distributed Stochastic Gradient Descent (SGD) for training deep neural networks. Previous …

Layer-Wise Adaptive Gradient Sparsification for Distributed Deep Learning with Convergence Guarantees.

S Shi, Z Tang, Q Wang, K Zhao, X Chu - ECAI, 2020 - ebooks.iospress.nl
To reduce the long training time of large deep neural network (DNN) models, distributed
synchronous stochastic gradient descent (S-SGD) is commonly used on a cluster of workers …

O (1) Communication for Distributed SGD through Two-Level Gradient Averaging

S Bhattacharya, W Yu, FT Chowdhury… - … Conference on Cluster …, 2021 - ieeexplore.ieee.org
Large neural network models present a hefty communication challenge to distributed
Stochastic Gradient Descent (SGD), with a per-iteration communication complexity of O(n) …

CE-SGD: Communication-efficient distributed machine learning

Z Tao, Q Xia, Q Li, S Cheng - 2021 IEEE Global …, 2021 - ieeexplore.ieee.org
Training large-scale machine learning models usually demands a distributed approach to
process the huge amount of training data efficiently. However, the high network …

Distributed Stochastic Gradient Descent With Compressed and Skipped Communication

TT Phuong, K Fukushima - IEEE Access, 2023 - ieeexplore.ieee.org
This paper introduces CompSkipDSGD, a new algorithm for distributed stochastic gradient
descent that aims to improve communication efficiency by compressing and selectively …

Communication-compressed adaptive gradient method for distributed nonconvex optimization

Y Wang, L Lin, J Chen - International Conference on Artificial …, 2022 - proceedings.mlr.press
Due to the explosion in the size of the training datasets, distributed learning has received
growing interest in recent years. One of the major bottlenecks is the large communication …

Sparse communication for training deep networks

NF Eghlidi, M Jaggi - arXiv preprint arXiv:2009.09271, 2020 - arxiv.org
Synchronous stochastic gradient descent (SGD) is the most common method used for
distributed training of deep learning models. In this algorithm, each worker shares its local …

Detached error feedback for distributed SGD with random sparsification

A Xu, H Huang - International Conference on Machine …, 2022 - proceedings.mlr.press
The communication bottleneck has been a critical problem in large-scale distributed deep
learning. In this work, we study distributed SGD with random block-wise sparsification as the …

Gradient noise convolution (gnc): Smoothing loss function for distributed large-batch sgd

K Haruki, T Suzuki, Y Hamakawa, T Toda… - arXiv preprint arXiv …, 2019 - arxiv.org
Large-batch stochastic gradient descent (SGD) is widely used for training in distributed deep
learning because of its training-time efficiency, however, extremely large-batch SGD leads to …