相关文章- 学术资源搜索

{ATP}: In-network aggregation for multi-tenant learning

CL Lao, Y Le, K Mahajan, Y Chen, W Wu… - … USENIX Symposium on …, 2021 - usenix.org

Distributed deep neural network training (DT) systems are widely deployed in clusters where
the network is shared across multiple tenants, ie, multiple DT jobs. Each DT job computes …

被引用次数：216 相关文章所有 11 个版本

[PDF] usenix.org

Scaling distributed machine learning with {In-Network} aggregation

A Sapio, M Canini, CY Ho, J Nelson, P Kalnis… - … USENIX Symposium on …, 2021 - usenix.org

Training machine learning models in parallel is an increasingly important workload. We
accelerate distributed parallel training by designing a communication primitive that uses a …

被引用次数：440 相关文章所有 19 个版本

[PDF] usenix.org

{TopoOpt}: Co-optimizing Network Topology and Parallelization Strategy for Distributed Training Jobs

W Wang, M Khazraee, Z Zhong, M Ghobadi… - … USENIX Symposium on …, 2023 - usenix.org

We propose TopoOpt, a novel direct-connect fabric for deep neural network (DNN) training
workloads. TopoOpt co-optimizes the distributed training process across three dimensions …

被引用次数：53 相关文章所有 10 个版本

[PDF] nsf.gov

A network-centric hardware/algorithm co-design to accelerate distributed training of deep neural networks

Y Li, J Park, M Alian, Y Yuan, Z Qu… - 2018 51st Annual …, 2018 - ieeexplore.ieee.org

Training real-world Deep Neural Networks (DNNs) can take an eon (ie, weeks or months)
without leveraging distributed systems. Even distributed training takes inordinate time, of …

被引用次数：100 相关文章所有 12 个版本

[PDF] usenix.org

Project adam: Building an efficient and scalable deep learning training system

T Chilimbi, Y Suzue, J Apacible… - 11th USENIX symposium …, 2014 - usenix.org

Large deep neural network models have recently demonstrated state-of-the-art accuracy on
hard visual recognition tasks. Unfortunately such models are extremely time consuming to …

被引用次数：1013 相关文章所有 14 个版本

[PDF] usenix.org

Alpa: Automating inter-and {Intra-Operator} parallelism for distributed deep learning

L Zheng, Z Li, H Zhang, Y Zhuang, Z Chen… - … USENIX Symposium on …, 2022 - usenix.org

Alpa automates model-parallel training of large deep learning (DL) models by generating
execution plans that unify data, operator, and pipeline parallelism. Existing model-parallel …

被引用次数：260 相关文章所有 17 个版本

An in-network architecture for accelerating shared-memory multiprocessor collectives

B Klenk, N Jiang, G Thorson… - 2020 ACM/IEEE 47th …, 2020 - ieeexplore.ieee.org

The slowdown of single-chip performance scaling combined with the growing demands of
computing ever larger problems efficiently has led to a renewed interest in distributed …

被引用次数：61 相关文章所有 3 个版本

[PDF] arxiv.org

Enabling compute-communication overlap in distributed deep learning training platforms

S Rashidi, M Denton, S Sridharan… - 2021 ACM/IEEE 48th …, 2021 - ieeexplore.ieee.org

Deep Learning (DL) training platforms are built by interconnecting multiple DL accelerators
(eg, GPU/TPU) via fast, customized interconnects with 100s of gigabytes (GBs) of bandwidth …

被引用次数：39 相关文章所有 6 个版本

[PDF] arxiv.org

Understanding data storage and ingestion for large-scale deep recommendation model training: Industrial product

M Zhao, N Agarwal, A Basant, B Gedik, S Pan… - Proceedings of the 49th …, 2022 - dl.acm.org

Datacenter-scale AI training clusters consisting of thousands of domain-specific accelerators
(DSA) are used to train increasingly-complex deep learning models. These clusters rely on a …

被引用次数：66 相关文章所有 4 个版本

[PDF] mlsys.org

In-network aggregation for shared machine learning clusters

N Gebara, M Ghobadi, P Costa - Proceedings of Machine …, 2021 - proceedings.mlsys.org

We present PANAMA, a network architecture for machine learning (ML) workloads on
shared clusters where a variety of training jobs co-exist. PANAMA consists of two key …

被引用次数：80 相关文章所有 4 个版本

高级搜索

QQ 群

{ATP}: In-network aggregation for multi-tenant learning

Scaling distributed machine learning with {In-Network} aggregation

{TopoOpt}: Co-optimizing Network Topology and Parallelization Strategy for Distributed Training Jobs

A network-centric hardware/algorithm co-design to accelerate distributed training of deep neural networks

Project adam: Building an efficient and scalable deep learning training system

Alpa: Automating inter-and {Intra-Operator} parallelism for distributed deep learning

An in-network architecture for accelerating shared-memory multiprocessor collectives

Enabling compute-communication overlap in distributed deep learning training platforms

Understanding data storage and ingestion for large-scale deep recommendation model training: Industrial product

In-network aggregation for shared machine learning clusters

相关搜索

引用