A comprehensive survey on training acceleration for large machine learning models in IoT

H Wang, Z Qu, Q Zhou, H Zhang, B Luo… - IEEE Internet of …, 2021 - ieeexplore.ieee.org
The ever-growing artificial intelligence (AI) applications have greatly reshaped our world in
many areas, eg, smart home, computer vision, natural language processing, etc. Behind …

Can decentralized algorithms outperform centralized algorithms? a case study for decentralized parallel stochastic gradient descent

X Lian, C Zhang, H Zhang, CJ Hsieh… - Advances in neural …, 2017 - proceedings.neurips.cc
Most distributed machine learning systems nowadays, including TensorFlow and CNTK, are
built in a centralized fashion. One bottleneck of centralized algorithms lies on high …

Qsparse-local-SGD: Distributed SGD with quantization, sparsification and local computations

D Basu, D Data, C Karakus… - Advances in Neural …, 2019 - proceedings.neurips.cc
Communication bottleneck has been identified as a significant issue in distributed
optimization of large-scale learning models. Recently, several approaches to mitigate this …

Asynchronous decentralized parallel stochastic gradient descent

X Lian, W Zhang, C Zhang, J Liu - … Conference on Machine …, 2018 - proceedings.mlr.press
Most commonly used distributed machine learning systems are either synchronous or
centralized asynchronous. Synchronous algorithms like AllReduce-SGD perform poorly in a …

Push–pull gradient methods for distributed optimization in networks

S Pu, W Shi, J Xu, A Nedić - IEEE Transactions on Automatic …, 2020 - ieeexplore.ieee.org
In this article, we focus on solving a distributed convex optimization problem in a network,
where each agent has its own convex cost function and the goal is to minimize the sum of …

Decentralized federated learning: Balancing communication and computing costs

W Liu, L Chen, W Zhang - IEEE Transactions on Signal and …, 2022 - ieeexplore.ieee.org
Decentralized stochastic gradient descent (SGD) is a driving engine for decentralized
federated learning (DFL). The performance of decentralized SGD is jointly influenced by …

A decentralized proximal-gradient method with network independent step-sizes and separated convergence rates

Z Li, W Shi, M Yan - IEEE Transactions on Signal Processing, 2019 - ieeexplore.ieee.org
This paper proposes a novel proximal-gradient algorithm for a decentralized optimization
problem with a composite objective containing smooth and nonsmooth terms. Specifically …

Cola: Decentralized linear learning

L He, A Bian, M Jaggi - Advances in Neural Information …, 2018 - proceedings.neurips.cc
Decentralized machine learning is a promising emerging paradigm in view of global
challenges of data ownership and privacy. We consider learning of linear classification and …

Decentralized stochastic bilevel optimization with improved per-iteration complexity

X Chen, M Huang, S Ma… - … on Machine Learning, 2023 - proceedings.mlr.press
Bilevel optimization recently has received tremendous attention due to its great success in
solving important machine learning problems like meta learning, reinforcement learning …

Robust and communication-efficient collaborative learning

A Reisizadeh, H Taheri, A Mokhtari… - Advances in …, 2019 - proceedings.neurips.cc
We consider a decentralized learning problem, where a set of computing nodes aim at
solving a non-convex optimization problem collaboratively. It is well-known that …