Parameter hub: a rack-scale parameter server for distributed deep neural network training

J Fang, H Xu, G Zhao, Z Yu, B Shen… - IEEE/ACM Transactions …, 2024 - ieeexplore.ieee.org

The surging scale of distributed training (DT) incurs significant communication overhead in
datacenters, while a promising solution is in-network aggregation (INA). It leverages …

被引用次数：1 相关文章所有 2 个版本

[PDF] arxiv.org

Constrained in-network computing with low congestion in datacenter networks

R Segal, C Avin, G Scalosub - IEEE INFOCOM 2022-IEEE …, 2022 - ieeexplore.ieee.org

Distributed computing has become a common practice nowadays, where recent focus has
been given to the usage of smart networking devices with in-network computing capabilities …

被引用次数：10 相关文章所有 6 个版本

[PDF] mlr.press

Shifted compression framework: Generalizations and improvements

E Shulgin, P Richtárik - Uncertainty in Artificial Intelligence, 2022 - proceedings.mlr.press

Communication is one of the key bottlenecks in the distributed training of large-scale
machine learning models, and lossy compression of exchanged information, such as …

被引用次数：7 相关文章所有 10 个版本

[PDF] osti.gov

A quantitative study of deep learning training on heterogeneous supercomputers

J Han, L Xu, M Rafique, AR Butt, SH Lim - 2019 - osti.gov

Deep learning (DL) has become a key technique for solving complex problems in scientific
research and discovery. DL training for science is substantially challenging because it has to …

被引用次数：21 相关文章所有 7 个版本

Endpoint-flexible coflow scheduling across geo-distributed datacenters

W Li, X Yuan, K Li, H Qi, X Zhou… - IEEE Transactions on …, 2020 - ieeexplore.ieee.org

Over the last decade, we have witnessed growing data volumes generated and stored
across geographically distributed datacenters. Processing such geo-distributed datasets …

被引用次数：17 相关文章所有 4 个版本

CEFS: Compute-efficient flow scheduling for iterative synchronous applications

S Wang, D Li, J Zhang, W Lin - … of the 16th International Conference on …, 2020 - dl.acm.org

Iterative Synchronous Applications (ISApps) are popular in today's data centers, represented
by distributed deep learning (DL) training. In ISApps, multiple nodes carry out the computing …

被引用次数：13 相关文章

XAgg: Accelerating Heterogeneous Distributed Training Through XDP-Based Gradient Aggregation

Q Zhang, G Zhao, H Xu, P Yang - IEEE/ACM Transactions on …, 2023 - ieeexplore.ieee.org

With the growth of model/dataset/system size for distributed model training in datacenters,
the widely used Parameter Server (PS) architecture suffers from communication bottleneck …

被引用次数：3 相关文章所有 3 个版本

[PDF] google.com

DGT: A contribution-aware differential gradient transmission mechanism for distributed machine learning

H Zhou, Z Li, Q Cai, H Yu, S Luo, L Luo… - Future Generation …, 2021 - Elsevier

Distributed machine learning is a mainstream system to learn insights for analytics and
intelligence services of many fronts (eg, health, streaming and business) from their massive …

被引用次数：10 相关文章所有 2 个版本

[PDF] researchgate.net

Horizontal or vertical? a hybrid approach to large-scale distributed machine learning

J Geng, D Li, S Wang - Proceedings of the 10th Workshop on Scientific …, 2019 - dl.acm.org

Data parallelism and model parallelism are two typical parallel modes for distributed
machine learning (DML). Traditionally, DML mainly leverages data parallelism, which …

被引用次数：20 相关文章所有 2 个版本

[PDF] google.com

PSNet: Reconfigurable network topology design for accelerating parameter server architecture based distributed machine learning

L Liu, Q Jin, D Wang, H Yu, G Sun, S Luo - Future Generation Computer …, 2020 - Elsevier

Abstract The bottleneck of Distributed Machine Learning (DML) has shifted from computation
to communication. Lots of works have focused on speeding up communication phase from …

被引用次数：16 相关文章所有 3 个版本

高级搜索

QQ 群