Double quantization for communication-efficient distributed optimization

Y Yu, J Wu, L Huang - Advances in Neural Information …, 2019 - proceedings.neurips.cc
Modern distributed training of machine learning models often suffers from high
communication overhead for synchronizing stochastic gradients and model parameters. In …

A linear speedup analysis of distributed deep learning with sparse and quantized communication

P Jiang, G Agrawal - Advances in Neural Information …, 2018 - proceedings.neurips.cc
The large communication overhead has imposed a bottleneck on the performance of
distributed Stochastic Gradient Descent (SGD) for training deep neural networks. Previous …

Communication-efficient distributed learning via lazily aggregated quantized gradients

J Sun, T Chen, G Giannakis… - Advances in Neural …, 2019 - proceedings.neurips.cc
The present paper develops a novel aggregated gradient approach for distributed machine
learning that adaptively compresses the gradient communication. The key idea is to first …

Convex optimization using sparsified stochastic gradient descent with memory

JB Cordonnier - 2018 - infoscience.epfl.ch
The interest for distributed stochastic optimization has raised to train complex Machine
Learning models with more data on distributed systems. Increasing the computation power …

Sparse communication for training deep networks

NF Eghlidi, M Jaggi - arXiv preprint arXiv:2009.09271, 2020 - arxiv.org
Synchronous stochastic gradient descent (SGD) is the most common method used for
distributed training of deep learning models. In this algorithm, each worker shares its local …

Bagua: scaling up distributed learning with system relaxations

S Gan, X Lian, R Wang, J Chang, C Liu, H Shi… - arXiv preprint arXiv …, 2021 - arxiv.org
Recent years have witnessed a growing list of systems for distributed data-parallel training.
Existing systems largely fit into two paradigms, ie, parameter server and MPI-style collective …

Error compensated quantized SGD and its applications to large-scale distributed optimization

J Wu, W Huang, J Huang… - … Conference on Machine …, 2018 - proceedings.mlr.press
Large-scale distributed optimization is of great importance in various applications. For data-
parallel based distributed learning, the inter-node gradient communication often becomes …

Layer-wise adaptive gradient sparsification for distributed deep learning with convergence guarantees

S Shi, Z Tang, Q Wang, K Zhao, X Chu - arXiv preprint arXiv:1911.08727, 2019 - arxiv.org
To reduce the long training time of large deep neural network (DNN) models, distributed
synchronous stochastic gradient descent (S-SGD) is commonly used on a cluster of workers …

JointSQ: Joint Sparsification-Quantization for Distributed Learning

W Xie, H Li, J Ma, Y Li, J Lei, D Liu… - Proceedings of the …, 2024 - openaccess.thecvf.com
Gradient sparsification and quantization offer a promising prospect to alleviate the
communication overhead problem in distributed learning. However direct combination of the …

Dynamic layer-wise sparsification for distributed deep learning

H Zhang, T Wu, Z Ma, F Li, J Liu - Future Generation Computer Systems, 2023 - Elsevier
Distributed stochastic gradient descent (SGD) algorithms are becoming popular in speeding
up deep learning model training by employing multiple computational devices (named …