Convergence analysis of sequential federated learning on heterogeneous data

Y Li, X Lyu - Advances in Neural Information Processing …, 2024 - proceedings.neurips.cc
There are two categories of methods in Federated Learning (FL) for joint training across
multiple clients: i) parallel FL (PFL), where clients train models in a parallel manner; and ii) …

On the convergence of federated averaging with cyclic client participation

YJ Cho, P Sharma, G Joshi, Z Xu… - International …, 2023 - proceedings.mlr.press
Abstract Federated Averaging (FedAvg) and its variants are the most popular optimization
algorithms in federated learning (FL). Previous convergence analyses of FedAvg either …

Tighter lower bounds for shuffling SGD: Random permutations and beyond

J Cha, J Lee, C Yun - International Conference on Machine …, 2023 - proceedings.mlr.press
We study convergence lower bounds of without-replacement stochastic gradient descent
(SGD) for solving smooth (strongly-) convex finite-sum minimization problems. Unlike most …

Minibatch vs local SGD with shuffling: Tight convergence bounds and beyond

C Yun, S Rajput, S Sra - arXiv preprint arXiv:2110.10342, 2021 - arxiv.org
In distributed learning, local SGD (also known as federated averaging) and its simple
baseline minibatch SGD are widely studied optimization methods. Most existing analyses of …

Federated optimization algorithms with random reshuffling and gradient compression

A Sadiev, G Malinovsky, E Gorbunov, I Sokolov… - arXiv preprint arXiv …, 2022 - arxiv.org
Gradient compression is a popular technique for improving communication complexity of
stochastic first-order methods in distributed training of machine learning models. However …

Benign underfitting of stochastic gradient descent

T Koren, R Livni, Y Mansour… - Advances in Neural …, 2022 - proceedings.neurips.cc
We study to what extent may stochastic gradient descent (SGD) be understood as
a``conventional''learning rule that achieves generalization performance by obtaining a good …

A general analysis of example-selection for stochastic gradient descent

Y Lu, SY Meng, C De Sa - International Conference on Learning …, 2022 - par.nsf.gov
Training example order in SGD has long been known to affect convergence rate. Recent
results show that accelerated rates are possible in a variety of cases for permutation-based …

Convergence Analysis of Sequential Split Learning on Heterogeneous Data

Y Li, X Lyu - arXiv preprint arXiv:2302.01633, 2023 - arxiv.org
Federated Learning (FL) and Split Learning (SL) are two popular paradigms of distributed
machine learning. By offloading the computation-intensive portions to the server, SL is …

Permutation-Based SGD: Is Random Optimal?

S Rajput, K Lee, D Papailiopoulos - arXiv preprint arXiv:2102.09718, 2021 - arxiv.org
A recent line of ground-breaking results for permutation-based SGD has corroborated a
widely observed phenomenon: random permutations offer faster convergence than with …

Mini-Batch Optimization of Contrastive Loss

J Cho, K Sreenivasan, K Lee, K Mun, S Yi… - arXiv preprint arXiv …, 2023 - arxiv.org
Contrastive learning has gained significant attention as a method for self-supervised
learning. The contrastive loss function ensures that embeddings of positive sample pairs …