From distributed machine to distributed deep learning: a comprehensive survey

M Dehghani, Z Yazdanparast - Journal of Big Data, 2023 - Springer
Artificial intelligence has made remarkable progress in handling complex tasks, thanks to
advances in hardware acceleration and machine learning algorithms. However, to acquire …

An in-depth analysis of distributed training of deep neural networks

Y Ko, K Choi, J Seo, SW Kim - 2021 IEEE International Parallel …, 2021 - ieeexplore.ieee.org
As the popularity of deep learning in industry rapidly grows, efficient training of deep neural
networks (DNNs) becomes important. To train a DNN with a large amount of data, distributed …

ALADDIN: Asymmetric Centralized Training for Distributed Deep Learning

Y Ko, K Choi, H Jei, D Lee, SW Kim - Proceedings of the 30th ACM …, 2021 - dl.acm.org
To speed up the training of massive deep neural network (DNN) models, distributed training
has been widely studied. In general, a centralized training, a type of distributed training …

Democratizing production-scale distributed deep learning

M Ma, H Pouransari, D Chao, S Adya… - arXiv preprint arXiv …, 2018 - arxiv.org
The interest and demand for training deep neural networks have been experiencing rapid
growth, spanning a wide range of applications in both academia and industry. However …

Distributed learning of deep neural networks using independent subnet training

B Yuan, CR Wolfe, C Dun, Y Tang, A Kyrillidis… - arXiv preprint arXiv …, 2019 - arxiv.org
Distributed machine learning (ML) can bring more computational resources to bear than
single-machine learning, thus enabling reductions in training time. Distributed learning …

A quick survey on large scale distributed deep learning systems

Z Zhang, L Yin, Y Peng, D Li - 2018 IEEE 24th International …, 2018 - ieeexplore.ieee.org
Deep learning have been widely used in various fields and has worked very well as a major
role. While the gradual penetration into various fields, data quantity of each applications is …

DBS: Dynamic batch size for distributed deep neural network training

Q Ye, Y Zhou, M Shi, Y Sun, J Lv - arXiv preprint arXiv:2007.11831, 2020 - arxiv.org
Synchronous strategies with data parallelism, such as the Synchronous StochasticGradient
Descent (S-SGD) and the model averaging methods, are widely utilizedin distributed …

EP4DDL: addressing straggler problem in heterogeneous distributed deep learning

Z Ji, X Zhang, J Li, J Wei, Z Wei - The Journal of Supercomputing, 2022 - Springer
Driven by big data, neural networks evolve more complex and the computing capacity of a
single machine is often difficult to meet the demand. Distributed deep learning technology …

HPDL: towards a general framework for high-performance distributed deep learning

D Li, Z Lai, K Ge, Y Zhang, Z Zhang… - 2019 IEEE 39th …, 2019 - ieeexplore.ieee.org
With growing scale of the data volume and neural network size, we have come into the era
of distributed deep learning. High-performance training and inference on distributed …

Parallel and distributed training of deep neural networks: A brief overview

A Farkas, G Kertész, R Lovas - 2020 IEEE 24th International …, 2020 - ieeexplore.ieee.org
Deep neural networks and deep learning are becoming important and popular techniques in
modern services and applications. The training of these networks is computationally …