Qsparse-local-SGD: Distributed SGD with quantization, sparsification and local computations

D Basu, D Data, C Karakus… - Advances in Neural …, 2019 - proceedings.neurips.cc
Communication bottleneck has been identified as a significant issue in distributed
optimization of large-scale learning models. Recently, several approaches to mitigate this …

Qsparse-local-SGD: Distributed SGD with quantization, sparsification, and local computations

D Basu, D Data, C Karakus… - IEEE Journal on Selected …, 2020 - ieeexplore.ieee.org
Communication bottleneck has been identified as a significant issue in distributed
optimization of large-scale learning models. Recently, several approaches to mitigate this …

[PDF][PDF] A convergence analysis of distributed SGD with communication-efficient gradient sparsification.

S Shi, K Zhao, Q Wang, Z Tang, X Chu - IJCAI, 2019 - ijcai.org
Gradient sparsification is a promising technique to significantly reduce the communication
overhead in decentralized synchronous stochastic gradient descent (S-SGD) algorithms …

Ac-sgd: Adaptively compressed sgd for communication-efficient distributed learning

G Yan, T Li, SL Huang, T Lan… - IEEE Journal on Selected …, 2022 - ieeexplore.ieee.org
Gradient compression (eg, gradient quantization and gradient sparsification) is a core
technique in reducing communication costs in distributed learning systems. The recent trend …

A linear speedup analysis of distributed deep learning with sparse and quantized communication

P Jiang, G Agrawal - Advances in Neural Information …, 2018 - proceedings.neurips.cc
The large communication overhead has imposed a bottleneck on the performance of
distributed Stochastic Gradient Descent (SGD) for training deep neural networks. Previous …

Sparsified SGD with memory

SU Stich, JB Cordonnier… - Advances in neural …, 2018 - proceedings.neurips.cc
Huge scale machine learning problems are nowadays tackled by distributed optimization
algorithms, ie algorithms that leverage the compute power of many devices for training. The …

rTop-k: A Statistical Estimation Approach to Distributed SGD

LP Barnes, HA Inan, B Isik… - IEEE Journal on Selected …, 2020 - ieeexplore.ieee.org
The large communication cost for exchanging gradients between different nodes
significantly limits the scalability of distributed training for large-scale learning models …

Doublesqueeze: Parallel stochastic gradient descent with double-pass error-compensated compression

H Tang, C Yu, X Lian, T Zhang… - … Conference on Machine …, 2019 - proceedings.mlr.press
A standard approach in large scale machine learning is distributed stochastic gradient
training, which requires the computation of aggregated stochastic gradients over multiple …

Gradient sparsification for communication-efficient distributed optimization

J Wangni, J Wang, J Liu… - Advances in Neural …, 2018 - proceedings.neurips.cc
Modern large-scale machine learning applications require stochastic optimization
algorithms to be implemented on distributed computational architectures. A key bottleneck is …

SQuARM-SGD: Communication-efficient momentum SGD for decentralized optimization

N Singh, D Data, J George… - IEEE Journal on Selected …, 2021 - ieeexplore.ieee.org
In this paper, we propose and analyze SQuARM-SGD, a communication-efficient algorithm
for decentralized training of large-scale machine learning models over a network. In …