Deep learning at scale

P Viviani, M Drocco, D Baccega… - 2019 27th Euromicro …, 2019 - ieeexplore.ieee.org
This work presents a novel approach to distributed training of deep neural networks (DNNs)
that aims to overcome the issues related to mainstream approaches to data parallel training …

Leader stochastic gradient descent for distributed training of deep learning models

Y Teng, W Gao, F Chalus… - Advances in …, 2019 - proceedings.neurips.cc
We consider distributed optimization under communication constraints for training deep
learning models. We propose a new algorithm, whose parameter updates rely on two forces …

Scalable deep learning on distributed infrastructures: Challenges, techniques, and tools

R Mayer, HA Jacobsen - ACM Computing Surveys (CSUR), 2020 - dl.acm.org
Deep Learning (DL) has had an immense success in the recent past, leading to state-of-the-
art results in various domains, such as image recognition and natural language processing …

Simple, efficient and convenient decentralized multi-task learning for neural networks

A Bouchra Pilet, D Frey, F Taïani - … in Intelligent Data Analysis XIX: 19th …, 2021 - Springer
Abstract Machine learning, and in particular neural networks, require large amounts of data,
which is increasingly highly distributed (eg over user devices, or independent storage …

Optimizing network performance in distributed machine learning

L Mai, C Hong, P Costa - 7th USENIX Workshop on Hot Topics in Cloud …, 2015 - usenix.org
To cope with the ever growing availability of training data, there have been several
proposals to scale machine learning computation beyond a single server and distribute it …

Fast distributed deep learning over rdma

J Xue, Y Miao, C Chen, M Wu, L Zhang… - Proceedings of the …, 2019 - dl.acm.org
Deep learning emerges as an important new resource-intensive workload and has been
successfully applied in computer vision, speech, natural language processing, and so on …

Consolidating incentivization in distributed neural network training via decentralized autonomous organization

S Nikolaidis, I Refanidis - Neural Computing and Applications, 2022 - Springer
Big data has reignited research interest in machine learning. Massive quantities of data are
being generated regularly as a consequence of the development in the Internet, social …

DLB: a dynamic load balance strategy for distributed training of deep neural networks

Q Ye, Y Zhou, M Shi, Y Sun, J Lv - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
Synchronous strategies with data parallelism are widely utilized in distributed training of
Deep Neural Networks (DNNs), largely owing to their easy implementation yet promising …

An allreduce algorithm and network co-design for large-scale training of distributed deep learning

TT Nguyen, M Wahib - 2021 IEEE/ACM 21st International …, 2021 - ieeexplore.ieee.org
Distributed training of Deep Neural Networks (DNNs) on High-Performance Computing
(HPC) systems is becoming increasingly common. HPC systems dedicated entirely or …

GRAWA: Gradient-based Weighted Averaging for Distributed Training of Deep Learning Models

T Dimlioglu, A Choromanska - International Conference on …, 2024 - proceedings.mlr.press
We study distributed training of deep learning models in time-constrained environments. We
propose a new algorithm that periodically pulls workers towards the center variable …