Asynchronous decentralized SGD with quantized and local updates

G Nadiradze, A Sabour, P Davies… - Advances in Neural …, 2021 - proceedings.neurips.cc
Decentralized optimization is emerging as a viable alternative for scalable distributed
machine learning, but also introduces new challenges in terms of synchronization costs. To …

Quantized distributed training of large models with convergence guarantees

I Markov, A Vladu, Q Guo… - … Conference on Machine …, 2023 - proceedings.mlr.press
Communication-reduction techniques are a popular way to improve scalability in data-
parallel training of deep neural networks (DNNs). The recent emergence of large language …

Compressed distributed gradient descent: Communication-efficient consensus over networks

X Zhang, J Liu, Z Zhu, ES Bentley - IEEE INFOCOM 2019-IEEE …, 2019 - ieeexplore.ieee.org
Network consensus optimization has received increasing attention in recent years and has
found important applications in many scientific and engineering fields. To solve network …

Gradient descent with compressed iterates

A Khaled, P Richtárik - arXiv preprint arXiv:1909.04716, 2019 - arxiv.org
We propose and analyze a new type of stochastic first order method: gradient descent with
compressed iterates (GDCI). GDCI in each iteration first compresses the current iterate using …

Communication-censored distributed stochastic gradient descent

W Li, Z Wu, T Chen, L Li, Q Ling - IEEE Transactions on Neural …, 2021 - ieeexplore.ieee.org
This article develops a communication-efficient algorithm to solve the stochastic optimization
problem defined over a distributed network, aiming at reducing the burdensome …

Decentralized dynamic admm with quantized and censored communications

Y Liu, K Yuan, G Wu, Z Tian… - 2019 53rd Asilomar …, 2019 - ieeexplore.ieee.org
In this paper, we develop a quantized and communication-censored alternating direction
method of multipliers (ADMM) to solve a dynamic optimization problem defined over a …

On achieving scalability through relaxation

G Nadiradze - 2021 - research-explorer.ista.ac.at
The scalability of concurrent data structures and distributed algorithms strongly depends on
reducing the contention for shared resources and the costs of synchronization and …

Decentralized SGD with asynchronous, local and quantized updates

G Nadiradze, A Sabour, P Davies, I Markov, S Li… - 2019 - openreview.net
The ability to scale distributed optimization to large node counts has been one of the main
enablers of recent progress in machine learning. To this end, several techniques have been …

Hybrid Decentralized Optimization: First-and Zeroth-Order Optimizers Can Be Jointly Leveraged For Faster Convergence

S Talaei, G Nadiradze, D Alistarh - arXiv preprint arXiv:2210.07703, 2022 - arxiv.org
Distributed optimization has become one of the standard ways of speeding up machine
learning training, and most of the research in the area focuses on distributed first-order …

Robust communication strategy for federated learning by incorporating self-attention

Y Xu, X Li, Z Yang, HJ Song - 2020 International Conference …, 2020 - spiedigitallibrary.org
Federated learning is an emerging machine learning setting, which can train a shared
model on large amounts of decentralized data while protecting data privacy. However, the …