Scaffold: Stochastic controlled averaging for federated learning

SP Karimireddy, S Kale, M Mohri… - International …, 2020 - proceedings.mlr.press
Federated learning is a key scenario in modern large-scale machine learning where the
data remains distributed over a large number of clients and the task is to learn a centralized …

Random reshuffling: Simple analysis with vast improvements

K Mishchenko, A Khaled… - Advances in Neural …, 2020 - proceedings.neurips.cc
Random Reshuffling (RR) is an algorithm for minimizing finite-sum functions that utilizes
iterative gradient descent steps in conjunction with data reshuffling. Often contrasted with its …

Convergence analysis of sequential federated learning on heterogeneous data

Y Li, X Lyu - Advances in Neural Information Processing …, 2024 - proceedings.neurips.cc
There are two categories of methods in Federated Learning (FL) for joint training across
multiple clients: i) parallel FL (PFL), where clients train models in a parallel manner; and ii) …

On the convergence of federated averaging with cyclic client participation

YJ Cho, P Sharma, G Joshi, Z Xu… - International …, 2023 - proceedings.mlr.press
Abstract Federated Averaging (FedAvg) and its variants are the most popular optimization
algorithms in federated learning (FL). Previous convergence analyses of FedAvg either …

Tighter lower bounds for shuffling SGD: Random permutations and beyond

J Cha, J Lee, C Yun - International Conference on Machine …, 2023 - proceedings.mlr.press
We study convergence lower bounds of without-replacement stochastic gradient descent
(SGD) for solving smooth (strongly-) convex finite-sum minimization problems. Unlike most …

A unified convergence analysis for shuffling-type gradient methods

LM Nguyen, Q Tran-Dinh, DT Phan, PH Nguyen… - Journal of Machine …, 2021 - jmlr.org
In this paper, we propose a unified convergence analysis for a class of generic shuffling-type
gradient methods for solving finite-sum optimization problems. Our analysis works with any …

Sgd with shuffling: optimal rates without component convexity and large epoch requirements

K Ahn, C Yun, S Sra - Advances in Neural Information …, 2020 - proceedings.neurips.cc
We study without-replacement SGD for solving finite-sum optimization problems.
Specifically, depending on how the indices of the finite-sum are shuffled, we consider the …

Minibatch vs local SGD with shuffling: Tight convergence bounds and beyond

C Yun, S Rajput, S Sra - arXiv preprint arXiv:2110.10342, 2021 - arxiv.org
In distributed learning, local SGD (also known as federated averaging) and its simple
baseline minibatch SGD are widely studied optimization methods. Most existing analyses of …

Closing the convergence gap of SGD without replacement

S Rajput, A Gupta… - … Conference on Machine …, 2020 - proceedings.mlr.press
Stochastic gradient descent without replacement sampling is widely used in practice for
model training. However, the vast majority of SGD analyses assumes data is sampled with …

Stochastic Newton and cubic Newton methods with simple local linear-quadratic rates

D Kovalev, K Mishchenko, P Richtárik - arXiv preprint arXiv:1912.01597, 2019 - arxiv.org
We present two new remarkably simple stochastic second-order methods for minimizing the
average of a very large number of sufficiently smooth and strongly convex functions. The first …