Parameter hub: a rack-scale parameter server for distributed deep neural network training

Topologies in distributed machine learning: Comprehensive survey, recommendations and future directions

L Liu, P Zhou, G Sun, X Chen, T Wu, H Yu, M Guizani - Neurocomputing, 2023 - Elsevier

With the widespread use of distributed machine learning (DML), many IT companies have
established networks dedicated to DML. Different communication architectures of DML have …

被引用次数：1 相关文章所有 3 个版本

[PDF] illinois.edu

When should the network be the computer?

DRK Ports, J Nelson - Proceedings of the Workshop on Hot Topics in …, 2019 - dl.acm.org

Researchers have repurposed programmable network devices to place small amounts of
application computation in the network, sometimes yielding orders-of-magnitude …

被引用次数：117 相关文章所有 17 个版本

[PDF] kaust.edu.sa

Compressed communication for distributed deep learning: Survey and quantitative evaluation

H Xu, CY Ho, AM Abdelmoniem, A Dutta, EH Bergou… - 2020 - repository.kaust.edu.sa

Powerful computer clusters are used nowadays to train complex deep neural networks
(DNN) on large datasets. Distributed training workloads increasingly become …

被引用次数：80 相关文章所有 3 个版本

[PDF] arxiv.org

Merlin hugeCTR: GPU-accelerated recommender system training and inference

Z Wang, Y Wei, M Lee, M Langer, F Yu, J Liu… - Proceedings of the 16th …, 2022 - dl.acm.org

In this talk, we introduce Merlin HugeCTR. Merlin HugeCTR is an open source, GPU-
accelerated integration framework for click-through rate estimation. It optimizes both training …

被引用次数：26 相关文章所有 4 个版本

[PDF] acm.org

Prague: High-performance heterogeneity-aware asynchronous decentralized training

Q Luo, J He, Y Zhuo, X Qian - Proceedings of the Twenty-Fifth …, 2020 - dl.acm.org

Distributed deep learning training usually adopts All-Reduce as the synchronization
mechanism for data parallel algorithms due to its high performance in homogeneous …

被引用次数：71 相关文章所有 3 个版本

Geryon: Accelerating distributed CNN training by network-level flow scheduling

S Wang, D Li, J Geng - IEEE INFOCOM 2020-IEEE Conference …, 2020 - ieeexplore.ieee.org

Increasingly rich data sets and complicated models make distributed machine learning more
and more important. However, the cost of extensive and frequent parameter …

被引用次数：57 相关文章所有 3 个版本

An in-network architecture for accelerating shared-memory multiprocessor collectives

B Klenk, N Jiang, G Thorson… - 2020 ACM/IEEE 47th …, 2020 - ieeexplore.ieee.org

The slowdown of single-chip performance scaling combined with the growing demands of
computing ever larger problems efficiently has led to a renewed interest in distributed …

被引用次数：61 相关文章所有 3 个版本

[PDF] google.com

GRID: Gradient routing with in-network aggregation for distributed training

J Fang, G Zhao, H Xu, C Wu… - IEEE/ACM Transactions on …, 2023 - ieeexplore.ieee.org

As the scale of distributed training increases, it brings huge communication overhead in
clusters. Some works try to reduce the communication cost through gradient compression or …

被引用次数：19 相关文章所有 3 个版本

[PDF] arxiv.org

Optimizing network performance for distributed dnn training on gpu clusters: Imagenet/alexnet training in 1.5 minutes

P Sun, W Feng, R Han, S Yan, Y Wen - arXiv preprint arXiv:1902.06855, 2019 - arxiv.org

It is important to scale out deep neural network (DNN) training for reducing model training
time. The high communication overhead is one of the major performance bottlenecks for …

被引用次数：79 相关文章所有 2 个版本

[PDF] nsf.gov

Gradient compression supercharged high-performance data parallel dnn training

Y Bai, C Li, Q Zhou, J Yi, P Gong, F Yan… - Proceedings of the …, 2021 - dl.acm.org

Gradient compression is a promising approach to alleviating the communication bottleneck
in data parallel deep neural network (DNN) training by significantly reducing the data …

被引用次数：39 相关文章所有 3 个版本

高级搜索

QQ 群