Efficient decentralized deep learning by dynamic model averaging

M Kamp, L Adilova, J Sicking, F Hüger… - Machine Learning and …, 2019 - Springer
We propose an efficient protocol for decentralized training of deep neural networks from
distributed data sources. The proposed protocol allows to handle different phases of model …

Eflops: Algorithm and system co-design for a high performance distributed training platform

J Dong, Z Cao, T Zhang, J Ye, S Wang… - … Symposium on High …, 2020 - ieeexplore.ieee.org
Deep neural networks (DNNs) have gained tremendous attractions as compelling solutions
for applications such as image classification, object detection, speech recognition, and so …

Stanza: Layer separation for distributed training in deep learning

X Wu, H Xu, B Li, Y Xiong - IEEE Transactions on Services …, 2020 - ieeexplore.ieee.org
The parameter server architecture is prevalently used for distributed deep learning. Each
worker machine in a such system trains the complete model, which leads to a large amount …

Tensoropt: Exploring the tradeoffs in distributed dnn training with auto-parallelism

Z Cai, X Yan, K Ma, Y Wu, Y Huang… - … on Parallel and …, 2021 - ieeexplore.ieee.org
Effective parallelization strategies are crucial for the performance of distributed deep neural
network (DNN) training. Recently, several methods have been proposed to search …

Communication optimization strategies for distributed deep neural network training: A survey

S Ouyang, D Dong, Y Xu, L Xiao - Journal of Parallel and Distributed …, 2021 - Elsevier
Recent trends in high-performance computing and deep learning have led to the
proliferation of studies on large-scale deep neural network training. However, the frequent …

Edge computing solutions for distributed machine learning

F Marozzo, A Orsino, D Talia… - 2022 IEEE Intl Conf on …, 2022 - ieeexplore.ieee.org
The rapid spread of the Internet of Things (IoT), with billions of connected devices, has
generated huge amounts of data and asks for decentralized solutions for machine learning …

Towards ubiquitous intelligent computing: Heterogeneous distributed deep neural networks

Z Zhang, T Song, L Lin, Y Hua, X He… - … Transactions on Big …, 2018 - ieeexplore.ieee.org
For the pursuit of ubiquitous computing, distributed computing systems containing the cloud,
edge devices, and Internet-of-Things devices are highly demanded. However, existing …

Performance modeling and scalability optimization of distributed deep learning systems

F Yan, O Ruwase, Y He, T Chilimbi - Proceedings of the 21th ACM …, 2015 - dl.acm.org
Big deep neural network (DNN) models trained on large amounts of data have recently
achieved the best accuracy on hard tasks, such as image and speech recognition. Training …

[PDF][PDF] Slim-DP: A multi-agent system for communication-efficient distributed deep learning

S Sun, W Chen, J Bian, X Liu, TY Liu - Proceedings of the 17th …, 2018 - ifaamas.org
To afford the huge computational cost, large-scale deep neural networks (DNN) are usually
trained on the distributed system, especially the widely-used parameter server architecture …

A survey on distributed machine learning

J Verbraeken, M Wolting, J Katzy… - Acm computing surveys …, 2020 - dl.acm.org
The demand for artificial intelligence has grown significantly over the past decade, and this
growth has been fueled by advances in machine learning techniques and the ability to …