Distributed machine learning (ML) can bring more computational resources to bear than single-machine learning, thus enabling reductions in training time. Distributed learning …
Y Ko, K Choi, J Seo, SW Kim - 2021 IEEE International Parallel …, 2021 - ieeexplore.ieee.org
As the popularity of deep learning in industry rapidly grows, efficient training of deep neural networks (DNNs) becomes important. To train a DNN with a large amount of data, distributed …
M Ma, H Pouransari, D Chao, S Adya… - arXiv preprint arXiv …, 2018 - arxiv.org
The interest and demand for training deep neural networks have been experiencing rapid growth, spanning a wide range of applications in both academia and industry. However …
Decentralized training of deep learning models enables on-device learning over networks, as well as efficient scaling to large compute clusters. Experiments in earlier works reveal …
Z Tang, S Shi, X Chu - 2020 IEEE 40th International …, 2020 - ieeexplore.ieee.org
The increasing size of machine learning models, especially deep neural network models, can improve the model generalization capability. However, large models require more …
S Ouyang, D Dong, Y Xu, L Xiao - Journal of Parallel and Distributed …, 2021 - Elsevier
Recent trends in high-performance computing and deep learning have led to the proliferation of studies on large-scale deep neural network training. However, the frequent …
We present PyDTNN, a framework for training deep neural networks (DNNs) on clusters of computers that has been designed as a research-oriented tool with a low learning curve …
Y Ko, K Choi, H Jei, D Lee, SW Kim - Proceedings of the 30th ACM …, 2021 - dl.acm.org
To speed up the training of massive deep neural network (DNN) models, distributed training has been widely studied. In general, a centralized training, a type of distributed training …
In this paper, we consider the parallel implementation of an already-trained deep model on multiple processing nodes (aka workers). Specifically, we investigate as to how a deep …