Distributed learning of deep neural networks using independent subnet training

B Yuan, CR Wolfe, C Dun, Y Tang, A Kyrillidis… - arXiv preprint arXiv …, 2019 - arxiv.org
Distributed machine learning (ML) can bring more computational resources to bear than
single-machine learning, thus enabling reductions in training time. Distributed learning …

Distributed learning of fully connected neural networks using independent subnet training

B Yuan, CR Wolfe, C Dun, Y Tang, A Kyrillidis… - Proceedings of the …, 2022 - par.nsf.gov
Distributed machine learning (ML) can bring more computational resources to bear than
single-machine learning, thus enabling reductions in training time. Distributed learning …

An in-depth analysis of distributed training of deep neural networks

Y Ko, K Choi, J Seo, SW Kim - 2021 IEEE International Parallel …, 2021 - ieeexplore.ieee.org
As the popularity of deep learning in industry rapidly grows, efficient training of deep neural
networks (DNNs) becomes important. To train a DNN with a large amount of data, distributed …

Democratizing production-scale distributed deep learning

M Ma, H Pouransari, D Chao, S Adya… - arXiv preprint arXiv …, 2018 - arxiv.org
The interest and demand for training deep neural networks have been experiencing rapid
growth, spanning a wide range of applications in both academia and industry. However …

Consensus control for decentralized deep learning

L Kong, T Lin, A Koloskova… - … on Machine Learning, 2021 - proceedings.mlr.press
Decentralized training of deep learning models enables on-device learning over networks,
as well as efficient scaling to large compute clusters. Experiments in earlier works reveal …

Communication-efficient decentralized learning with sparsification and adaptive peer selection

Z Tang, S Shi, X Chu - 2020 IEEE 40th International …, 2020 - ieeexplore.ieee.org
The increasing size of machine learning models, especially deep neural network models,
can improve the model generalization capability. However, large models require more …

Communication optimization strategies for distributed deep neural network training: A survey

S Ouyang, D Dong, Y Xu, L Xiao - Journal of Parallel and Distributed …, 2021 - Elsevier
Recent trends in high-performance computing and deep learning have led to the
proliferation of studies on large-scale deep neural network training. However, the frequent …

A flexible research-oriented framework for distributed training of deep neural networks

S Barrachina, A Castelló, M Catalán… - 2021 IEEE …, 2021 - ieeexplore.ieee.org
We present PyDTNN, a framework for training deep neural networks (DNNs) on clusters of
computers that has been designed as a research-oriented tool with a low learning curve …

ALADDIN: Asymmetric Centralized Training for Distributed Deep Learning

Y Ko, K Choi, H Jei, D Lee, SW Kim - Proceedings of the 30th ACM …, 2021 - dl.acm.org
To speed up the training of massive deep neural network (DNN) models, distributed training
has been widely studied. In general, a centralized training, a type of distributed training …

Efficient distributed inference of deep neural networks via restructuring and pruning

A Abdi, S Rashidi, F Fekri, T Krishna - Proceedings of the AAAI …, 2023 - ojs.aaai.org
In this paper, we consider the parallel implementation of an already-trained deep model on
multiple processing nodes (aka workers). Specifically, we investigate as to how a deep …