A linearly convergent algorithm for decentralized optimization: Sending less bits for free!

D Kovalev, A Koloskova, M Jaggi… - International …, 2021 - proceedings.mlr.press
Decentralized optimization methods enable on-device training of machine learning models
without a central coordinator. In many scenarios communication between devices is energy …

A DAG model of synchronous stochastic gradient descent in distributed deep learning

S Shi, Q Wang, X Chu, B Li - 2018 IEEE 24th International …, 2018 - ieeexplore.ieee.org
With huge amounts of training data, deep learning has made great breakthroughs in many
artificial intelligence (AI) applications. However, such large-scale data sets present …

Matcha: A matching-based link scheduling strategy to speed up distributed optimization

J Wang, AK Sahu, G Joshi, S Kar - IEEE Transactions on Signal …, 2022 - ieeexplore.ieee.org
In this paper, we study the problem of distributed optimization using an arbitrary network of
lightweight computing nodes, where each node can only send/receive information to/from its …

Minimizing training time of distributed machine learning by reducing data communication

Y Duan, N Wang, J Wu - IEEE Transactions on Network …, 2021 - ieeexplore.ieee.org
Due to the additive property of most machine learning objective functions, the training can
be distributed to multiple machines. Distributed machine learning is an efficient way to deal …

Decentralized SGD and average-direction SAM are asymptotically equivalent

T Zhu, F He, K Chen, M Song… - … Conference on Machine …, 2023 - proceedings.mlr.press
Decentralized stochastic gradient descent (D-SGD) allows collaborative learning on
massive devices simultaneously without the control of a central server. However, existing …

Local adaalter: Communication-efficient stochastic gradient descent with adaptive learning rates

C Xie, O Koyejo, I Gupta, H Lin - arXiv preprint arXiv:1911.09030, 2019 - arxiv.org
When scaling distributed training, the communication overhead is often the bottleneck. In
this paper, we propose a novel SGD variant with reduced communication and adaptive …

Dynamic aggregation for heterogeneous quantization in federated learning

S Chen, C Shen, L Zhang… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org
Communication is widely known as the primary bottleneck of federated learning, and
quantization of local model updates before uploading to the parameter server is an effective …

Stochastic distributed learning with gradient quantization and double-variance reduction

S Horváth, D Kovalev, K Mishchenko… - Optimization Methods …, 2023 - Taylor & Francis
We consider distributed optimization over several devices, each sending incremental model
updates to a central server. This setting is considered, for instance, in federated learning …

Multi-tier federated learning for vertically partitioned data

A Das, S Patterson - ICASSP 2021-2021 IEEE International …, 2021 - ieeexplore.ieee.org
We consider decentralized model training in tiered communication networks. Our network
model consists of a set of silos, each holding a vertical partition of the data. Each silo …

Decentralized federated learning with unreliable communications

H Ye, L Liang, GY Li - IEEE journal of selected topics in signal …, 2022 - ieeexplore.ieee.org
Decentralized federated learning, inherited from decentralized learning, enables the edge
devices to collaborate on model training in a peer-to-peer manner without the assistance of …