Abstract Federated Averaging (FedAvg) and its variants are the most popular optimization algorithms in federated learning (FL). Previous convergence analyses of FedAvg either …
J Cha, J Lee, C Yun - International Conference on Machine …, 2023 - proceedings.mlr.press
We study convergence lower bounds of without-replacement stochastic gradient descent (SGD) for solving smooth (strongly-) convex finite-sum minimization problems. Unlike most …
C Yun, S Rajput, S Sra - arXiv preprint arXiv:2110.10342, 2021 - arxiv.org
In distributed learning, local SGD (also known as federated averaging) and its simple baseline minibatch SGD are widely studied optimization methods. Most existing analyses of …
Gradient compression is a popular technique for improving communication complexity of stochastic first-order methods in distributed training of machine learning models. However …
We study to what extent may stochastic gradient descent (SGD) be understood as a``conventional''learning rule that achieves generalization performance by obtaining a good …
Y Lu, SY Meng, C De Sa - International Conference on Learning …, 2022 - par.nsf.gov
Training example order in SGD has long been known to affect convergence rate. Recent results show that accelerated rates are possible in a variety of cases for permutation-based …
Y Li, X Lyu - arXiv preprint arXiv:2302.01633, 2023 - arxiv.org
Federated Learning (FL) and Split Learning (SL) are two popular paradigms of distributed machine learning. By offloading the computation-intensive portions to the server, SL is …
A recent line of ground-breaking results for permutation-based SGD has corroborated a widely observed phenomenon: random permutations offer faster convergence than with …
Contrastive learning has gained significant attention as a method for self-supervised learning. The contrastive loss function ensures that embeddings of positive sample pairs …