Random shuffling beats sgd only after many epochs on ill-conditioned problems

Y Li, X Lyu - Advances in Neural Information Processing …, 2024 - proceedings.neurips.cc

There are two categories of methods in Federated Learning (FL) for joint training across
multiple clients: i) parallel FL (PFL), where clients train models in a parallel manner; and ii) …

被引用次数：22 相关文章所有 6 个版本

[PDF] mlr.press

On the convergence of federated averaging with cyclic client participation

YJ Cho, P Sharma, G Joshi, Z Xu… - International …, 2023 - proceedings.mlr.press

Abstract Federated Averaging (FedAvg) and its variants are the most popular optimization
algorithms in federated learning (FL). Previous convergence analyses of FedAvg either …

被引用次数：29 相关文章所有 10 个版本

[PDF] mlr.press

Tighter lower bounds for shuffling SGD: Random permutations and beyond

J Cha, J Lee, C Yun - International Conference on Machine …, 2023 - proceedings.mlr.press

We study convergence lower bounds of without-replacement stochastic gradient descent
(SGD) for solving smooth (strongly-) convex finite-sum minimization problems. Unlike most …

被引用次数：22 相关文章所有 8 个版本

[PDF] arxiv.org

Minibatch vs local SGD with shuffling: Tight convergence bounds and beyond

C Yun, S Rajput, S Sra - arXiv preprint arXiv:2110.10342, 2021 - arxiv.org

In distributed learning, local SGD (also known as federated averaging) and its simple
baseline minibatch SGD are widely studied optimization methods. Most existing analyses of …

被引用次数：45 相关文章所有 5 个版本

[PDF] arxiv.org

Federated optimization algorithms with random reshuffling and gradient compression

A Sadiev, G Malinovsky, E Gorbunov, I Sokolov… - arXiv preprint arXiv …, 2022 - arxiv.org

Gradient compression is a popular technique for improving communication complexity of
stochastic first-order methods in distributed training of machine learning models. However …

被引用次数：27 相关文章所有 8 个版本

[PDF] neurips.cc

Benign underfitting of stochastic gradient descent

T Koren, R Livni, Y Mansour… - Advances in Neural …, 2022 - proceedings.neurips.cc

We study to what extent may stochastic gradient descent (SGD) be understood as
a``conventional''learning rule that achieves generalization performance by obtaining a good …

被引用次数：18 相关文章所有 7 个版本

[PDF] nsf.gov

A general analysis of example-selection for stochastic gradient descent

Y Lu, SY Meng, C De Sa - International Conference on Learning …, 2022 - par.nsf.gov

Training example order in SGD has long been known to affect convergence rate. Recent
results show that accelerated rates are possible in a variety of cases for permutation-based …

被引用次数：23 相关文章所有 2 个版本

[PDF] arxiv.org

Convergence Analysis of Sequential Split Learning on Heterogeneous Data

Y Li, X Lyu - arXiv preprint arXiv:2302.01633, 2023 - arxiv.org

Federated Learning (FL) and Split Learning (SL) are two popular paradigms of distributed
machine learning. By offloading the computation-intensive portions to the server, SL is …

被引用次数：9 相关文章所有 3 个版本

[PDF] arxiv.org

Permutation-Based SGD: Is Random Optimal?

S Rajput, K Lee, D Papailiopoulos - arXiv preprint arXiv:2102.09718, 2021 - arxiv.org

A recent line of ground-breaking results for permutation-based SGD has corroborated a
widely observed phenomenon: random permutations offer faster convergence than with …

被引用次数：19 相关文章所有 4 个版本

[PDF] arxiv.org

Mini-Batch Optimization of Contrastive Loss

J Cho, K Sreenivasan, K Lee, K Mun, S Yi… - arXiv preprint arXiv …, 2023 - arxiv.org

Contrastive learning has gained significant attention as a method for self-supervised
learning. The contrastive loss function ensures that embeddings of positive sample pairs …

被引用次数：3 相关文章所有 2 个版本

高级搜索

QQ 群