Recent advances in stochastic gradient descent in deep learning

Y Tian, Y Zhang, H Zhang - Mathematics, 2023 - mdpi.com
In the age of artificial intelligence, the best approach to handling huge amounts of data is a
tremendously motivating and hard problem. Among machine learning models, stochastic …

Federated learning with buffered asynchronous aggregation

J Nguyen, K Malik, H Zhan… - International …, 2022 - proceedings.mlr.press
Scalability and privacy are two critical concerns for cross-device federated learning (FL)
systems. In this work, we identify that synchronous FL–cannot scale efficiently beyond a few …

Federated learning with non-iid data

Y Zhao, M Li, L Lai, N Suda, D Civin… - arXiv preprint arXiv …, 2018 - arxiv.org
Federated learning enables resource-constrained edge compute devices, such as mobile
phones and IoT devices, to learn a shared model for prediction, while keeping the training …

Adaptive methods for nonconvex optimization

M Zaheer, S Reddi, D Sachan… - Advances in neural …, 2018 - proceedings.neurips.cc
Adaptive gradient methods that rely on scaling gradients down by the square root of
exponential moving averages of past squared gradients, such RMSProp, Adam, Adadelta …

Federated optimization: Distributed machine learning for on-device intelligence

J Konečný, HB McMahan, D Ramage… - arXiv preprint arXiv …, 2016 - arxiv.org
We introduce a new and increasingly relevant setting for distributed optimization in machine
learning, where the data defining the optimization are unevenly distributed over an …

Gradient sparsification for communication-efficient distributed optimization

J Wangni, J Wang, J Liu… - Advances in Neural …, 2018 - proceedings.neurips.cc
Modern large-scale machine learning applications require stochastic optimization
algorithms to be implemented on distributed computational architectures. A key bottleneck is …

Revisiting distributed synchronous SGD

J Chen, X Pan, R Monga, S Bengio… - arXiv preprint arXiv …, 2016 - arxiv.org
Distributed training of deep learning models on large-scale training data is typically
conducted with asynchronous stochastic optimization to maximize the rate of updates, at the …

Stochastic variance reduction for nonconvex optimization

SJ Reddi, A Hefny, S Sra, B Poczos… - … on machine learning, 2016 - proceedings.mlr.press
We study nonconvex finite-sum problems and analyze stochastic variance reduced gradient
(SVRG) methods for them. SVRG and related methods have recently surged into …

Federated variance-reduced stochastic gradient descent with robustness to byzantine attacks

Z Wu, Q Ling, T Chen… - IEEE Transactions on …, 2020 - ieeexplore.ieee.org
This paper deals with distributed finite-sum optimization for learning over multiple workers in
the presence of malicious Byzantine attacks. Most resilient approaches so far combine …

Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization

SJ Reddi, S Sra, B Poczos… - Advances in neural …, 2016 - proceedings.neurips.cc
We analyze stochastic algorithms for optimizing nonconvex, nonsmooth finite-sum problems,
where the nonsmooth part is convex. Surprisingly, unlike the smooth case, our knowledge of …