A unified architecture for accelerating distributed {DNN} training in heterogeneous {GPU/CPU}...

S Duan, D Wang, J Ren, F Lyu, Y Zhang… - … Surveys & Tutorials, 2022 - ieeexplore.ieee.org

As the computing paradigm shifts from cloud computing to end-edge-cloud computing, it
also supports artificial intelligence evolving from a centralized manner to a distributed one …

被引用次数：123 相关文章所有 2 个版本

[PDF] arxiv.org

Edge-cloud polarization and collaboration: A comprehensive survey for ai

J Yao, S Zhang, Y Yao, F Wang, J Ma… - … on Knowledge and …, 2022 - ieeexplore.ieee.org

Influenced by the great success of deep learning via cloud computing and the rapid
development of edge chips, research in artificial intelligence (AI) has shifted to both of the …

被引用次数：73 相关文章所有 5 个版本

[HTML] sciencedirect.com

[HTML][HTML] Pre-trained models: Past, present and future

X Han, Z Zhang, N Ding, Y Gu, X Liu, Y Huo, J Qiu… - AI Open, 2021 - Elsevier

Large-scale pre-trained models (PTMs) such as BERT and GPT have recently achieved
great success and become a milestone in the field of artificial intelligence (AI). Owing to …

被引用次数：672 相关文章所有 9 个版本

[PDF] usenix.org

Scaling distributed machine learning with {In-Network} aggregation

A Sapio, M Canini, CY Ho, J Nelson, P Kalnis… - … USENIX Symposium on …, 2021 - usenix.org

Training machine learning models in parallel is an increasingly important workload. We
accelerate distributed parallel training by designing a communication primitive that uses a …

被引用次数：417 相关文章所有 19 个版本

[PDF] neurips.cc

Decentralized training of foundation models in heterogeneous environments

B Yuan, Y He, J Davis, T Zhang… - Advances in …, 2022 - proceedings.neurips.cc

Training foundation models, such as GPT-3 and PaLM, can be extremely expensive, often
involving tens of thousands of GPUs running continuously for months. These models are …

被引用次数：61 相关文章所有 10 个版本

[PDF] usenix.org

{ATP}: In-network aggregation for multi-tenant learning

CL Lao, Y Le, K Mahajan, Y Chen, W Wu… - … USENIX Symposium on …, 2021 - usenix.org

Distributed deep neural network training (DT) systems are widely deployed in clusters where
the network is shared across multiple tenants, ie, multiple DT jobs. Each DT job computes …

被引用次数：199 相关文章所有 11 个版本

[PDF] usenix.org

{SRNIC}: A scalable architecture for {RDMA}{NICs}

Z Wang, L Luo, Q Ning, C Zeng, W Li, X Wan… - … USENIX Symposium on …, 2023 - usenix.org

RDMA is expected to be highly scalable: to perform well in large-scale data center networks
where packet losses are inevitable (ie, high network scalability), and to support a large …

被引用次数：38 相关文章所有 7 个版本

[PDF] arxiv.org

Software-hardware co-design for fast and scalable training of deep learning recommendation models

D Mudigere, Y Hao, J Huang, Z Jia, A Tulloch… - Proceedings of the 49th …, 2022 - dl.acm.org

Deep learning recommendation models (DLRMs) have been used across many business-
critical services at Meta and are the single largest AI application in terms of infrastructure …

被引用次数：84 相关文章所有 7 个版本

[PDF] ieee.org

Liquid: Intelligent resource estimation and network-efficient scheduling for deep learning jobs on distributed GPU clusters

R Gu, Y Chen, S Liu, H Dai, G Chen… - … on Parallel and …, 2021 - ieeexplore.ieee.org

Deep learning (DL) is becoming increasingly popular in many domains, including computer
vision, speech recognition, self-driving automobiles, etc. GPU can train DL models efficiently …

被引用次数：59 相关文章所有 4 个版本

[PDF] yibozhu.com

Multi-resource interleaving for deep learning training

Y Zhao, Y Liu, Y Peng, Y Zhu, X Liu, X Jin - Proceedings of the ACM …, 2022 - dl.acm.org

Training Deep Learning (DL) model requires multiple resource types, including CPUs,
GPUs, storage IO, and network IO. Advancements in DL have produced a wide spectrum of …

被引用次数：39 相关文章所有 4 个版本

高级搜索

QQ 群