How good is SGD with random shuffling?

SP Karimireddy, S Kale, M Mohri… - International …, 2020 - proceedings.mlr.press

Federated learning is a key scenario in modern large-scale machine learning where the
data remains distributed over a large number of clients and the task is to learn a centralized …

被引用次数：3204 相关文章所有 7 个版本

[PDF] neurips.cc

Random reshuffling: Simple analysis with vast improvements

K Mishchenko, A Khaled… - Advances in Neural …, 2020 - proceedings.neurips.cc

Random Reshuffling (RR) is an algorithm for minimizing finite-sum functions that utilizes
iterative gradient descent steps in conjunction with data reshuffling. Often contrasted with its …

被引用次数：159 相关文章所有 11 个版本

[PDF] neurips.cc

Convergence analysis of sequential federated learning on heterogeneous data

Y Li, X Lyu - Advances in Neural Information Processing …, 2024 - proceedings.neurips.cc

There are two categories of methods in Federated Learning (FL) for joint training across
multiple clients: i) parallel FL (PFL), where clients train models in a parallel manner; and ii) …

被引用次数：22 相关文章所有 6 个版本

[PDF] mlr.press

On the convergence of federated averaging with cyclic client participation

YJ Cho, P Sharma, G Joshi, Z Xu… - International …, 2023 - proceedings.mlr.press

Abstract Federated Averaging (FedAvg) and its variants are the most popular optimization
algorithms in federated learning (FL). Previous convergence analyses of FedAvg either …

被引用次数：29 相关文章所有 10 个版本

[PDF] mlr.press

Tighter lower bounds for shuffling SGD: Random permutations and beyond

J Cha, J Lee, C Yun - International Conference on Machine …, 2023 - proceedings.mlr.press

We study convergence lower bounds of without-replacement stochastic gradient descent
(SGD) for solving smooth (strongly-) convex finite-sum minimization problems. Unlike most …

被引用次数：22 相关文章所有 8 个版本

[PDF] jmlr.org

A unified convergence analysis for shuffling-type gradient methods

LM Nguyen, Q Tran-Dinh, DT Phan, PH Nguyen… - Journal of Machine …, 2021 - jmlr.org

In this paper, we propose a unified convergence analysis for a class of generic shuffling-type
gradient methods for solving finite-sum optimization problems. Our analysis works with any …

被引用次数：86 相关文章所有 12 个版本

[PDF] neurips.cc

Sgd with shuffling: optimal rates without component convexity and large epoch requirements

K Ahn, C Yun, S Sra - Advances in Neural Information …, 2020 - proceedings.neurips.cc

We study without-replacement SGD for solving finite-sum optimization problems.
Specifically, depending on how the indices of the finite-sum are shuffled, we consider the …

被引用次数：79 相关文章所有 6 个版本

[PDF] arxiv.org

Minibatch vs local SGD with shuffling: Tight convergence bounds and beyond

C Yun, S Rajput, S Sra - arXiv preprint arXiv:2110.10342, 2021 - arxiv.org

In distributed learning, local SGD (also known as federated averaging) and its simple
baseline minibatch SGD are widely studied optimization methods. Most existing analyses of …

被引用次数：45 相关文章所有 5 个版本

[PDF] mlr.press

Closing the convergence gap of SGD without replacement

S Rajput, A Gupta… - … Conference on Machine …, 2020 - proceedings.mlr.press

Stochastic gradient descent without replacement sampling is widely used in practice for
model training. However, the vast majority of SGD analyses assumes data is sampled with …

被引用次数：70 相关文章所有 7 个版本

[PDF] arxiv.org

Stochastic Newton and cubic Newton methods with simple local linear-quadratic rates

D Kovalev, K Mishchenko, P Richtárik - arXiv preprint arXiv:1912.01597, 2019 - arxiv.org

We present two new remarkably simple stochastic second-order methods for minimizing the
average of a very large number of sufficiently smooth and strongly convex functions. The first …

被引用次数：50 相关文章所有 4 个版本

高级搜索

QQ 群