The evolution of distributed systems for graph neural networks and their origin in graph processing and deep learning: A survey

J Vatter, R Mayer, HA Jacobsen - ACM Computing Surveys, 2023 - dl.acm.org
Graph neural networks (GNNs) are an emerging research field. This specialized deep
neural network architecture is capable of processing graph structured data and bridges the …

Scalable deep learning on distributed infrastructures: Challenges, techniques, and tools

R Mayer, HA Jacobsen - ACM Computing Surveys (CSUR), 2020 - dl.acm.org
Deep Learning (DL) has had an immense success in the recent past, leading to state-of-the-
art results in various domains, such as image recognition and natural language processing …

{MLaaS} in the wild: Workload analysis and scheduling in {Large-Scale} heterogeneous {GPU} clusters

Q Weng, W Xiao, Y Yu, W Wang, C Wang, J He… - … USENIX Symposium on …, 2022 - usenix.org
With the sustained technological advances in machine learning (ML) and the availability of
massive datasets recently, tech companies are deploying large ML-as-a-Service (MLaaS) …

Privacy preserving machine learning with homomorphic encryption and federated learning

H Fang, Q Qian - Future Internet, 2021 - mdpi.com
Privacy protection has been an important concern with the great success of machine
learning. In this paper, it proposes a multi-party privacy preserving machine learning …

Biscotti: A blockchain system for private and secure federated learning

M Shayan, C Fung, CJM Yoon… - IEEE Transactions on …, 2020 - ieeexplore.ieee.org
Federated Learning is the current state-of-the-art in supporting secure multi-party machine
learning (ML): data is maintained on the owner's device and the updates to the model are …

Revisiting distributed synchronous SGD

J Chen, X Pan, R Monga, S Bengio… - arXiv preprint arXiv …, 2016 - arxiv.org
Distributed training of deep learning models on large-scale training data is typically
conducted with asynchronous stochastic optimization to maximize the rate of updates, at the …

Poseidon: An efficient communication architecture for distributed deep learning on {GPU} clusters

H Zhang, Z Zheng, S Xu, W Dai, Q Ho, X Liang… - 2017 USENIX Annual …, 2017 - usenix.org
Deep learning models can take weeks to train on a single GPU-equipped machine,
necessitating scaling out DL training to a GPU-cluster. However, current distributed DL …

{nnScaler}:{Constraint-Guided} Parallelization Plan Generation for Deep Learning Training

Z Lin, Y Miao, Q Zhang, F Yang, Y Zhu, C Li… - … USENIX Symposium on …, 2024 - usenix.org
With the growing model size of deep neural networks (DNN), deep learning training is
increasingly relying on handcrafted search spaces to find efficient parallelization execution …

Decentralized federated learning: A segmented gossip approach

C Hu, J Jiang, Z Wang - arXiv preprint arXiv:1908.07782, 2019 - arxiv.org
The emerging concern about data privacy and security has motivated the proposal of
federated learning, which allows nodes to only synchronize the locally-trained models …

{AntMan}: Dynamic scaling on {GPU} clusters for deep learning

W Xiao, S Ren, Y Li, Y Zhang, P Hou, Z Li… - … USENIX Symposium on …, 2020 - usenix.org
Efficiently scheduling deep learning jobs on large-scale GPU clusters is crucial for job
performance, system throughput, and hardware utilization. It is getting ever more …