- 学术资源搜索

[HTML][HTML] Strategies and principles of distributed machine learning on big data

EP Xing, Q Ho, P Xie, D Wei - Engineering, 2016 - Elsevier

The rise of big data has led to new demands for machine learning (ML) systems to learn
complex models, with millions to billions of parameters, that promise adequate capacity to …

被引用次数：226 相关文章所有 12 个版本

[PDF] arxiv.org

Orchestrating the development lifecycle of machine learning-based IoT applications: A taxonomy and survey

B Qian, J Su, Z Wen, DN Jha, Y Li, Y Guan… - ACM Computing …, 2020 - dl.acm.org

Machine Learning (ML) and Internet of Things (IoT) are complementary advances: ML
techniques unlock the potential of IoT with intelligence, and IoT applications increasingly …

被引用次数：102 相关文章所有 11 个版本

[PDF] neurips.cc

Gpipe: Efficient training of giant neural networks using pipeline parallelism

Y Huang, Y Cheng, A Bapna, O Firat… - Advances in neural …, 2019 - proceedings.neurips.cc

Scaling up deep neural network capacity has been known as an effective approach to
improving model quality for several different machine learning tasks. In many cases …

被引用次数：1841 相关文章所有 15 个版本

[PDF] arxiv.org

Pipedream: Fast and efficient pipeline parallel dnn training

A Harlap, D Narayanan, A Phanishayee… - arXiv preprint arXiv …, 2018 - arxiv.org

PipeDream is a Deep Neural Network (DNN) training system for GPUs that parallelizes
computation by pipelining execution across multiple machines. Its pipeline parallel …

被引用次数：289 相关文章所有 10 个版本

[PDF] arxiv.org

Petuum: A new platform for distributed machine learning on big data

EP Xing, Q Ho, W Dai, JK Kim, J Wei, S Lee… - Proceedings of the 21th …, 2015 - dl.acm.org

How can one build a distributed framework that allows efficient deployment of a wide
spectrum of modern advanced machine learning (ML) programs for industrial-scale …

被引用次数：659 相关文章所有 21 个版本

[PDF] academia.edu

FDML: A collaborative machine learning framework for distributed features

Y Hu, D Niu, J Yang, S Zhou - Proceedings of the 25th ACM SIGKDD …, 2019 - dl.acm.org

Most current distributed machine learning systems try to scale up model training by using a
data-parallel architecture that divides the computation for different samples among workers …

被引用次数：211 相关文章所有 4 个版本

[PDF] cityu.edu.hk

Pyramid: Enabling hierarchical neural networks with edge computing

Q He, Z Dong, F Chen, S Deng, W Liang… - Proceedings of the ACM …, 2022 - dl.acm.org

Machine learning (ML) is powering a rapidly-increasing number of web applications. As a
crucial part of 5G, edge computing facilitates edge artificial intelligence (AI) by ML model …

被引用次数：68 相关文章所有 7 个版本

[PDF] usenix.org

{HetPipe}: Enabling large {DNN} training on (whimpy) heterogeneous {GPU} clusters through integration of pipelined model parallelism and data parallelism

JH Park, G Yun, MY Chang, NT Nguyen, S Lee… - 2020 USENIX Annual …, 2020 - usenix.org

Deep Neural Network (DNN) models have continuously been growing in size in order to
improve the accuracy and quality of the models. Moreover, for training of large DNN models …

被引用次数：150 相关文章所有 10 个版本

[PDF] acm.org

SiP-ML: high-bandwidth optical network interconnects for machine learning training

M Khani, M Ghobadi, M Alizadeh, Z Zhu… - Proceedings of the …, 2021 - dl.acm.org

This paper proposes optical network interconnects as a key enabler for building high-
bandwidth ML training clusters with strong scaling properties. Our design, called SiP-ML …

被引用次数：97 相关文章所有 8 个版本

[PDF] arxiv.org

Lightlda: Big topic models on modest computer clusters

J Yuan, F Gao, Q Ho, W Dai, J Wei, X Zheng… - Proceedings of the 24th …, 2015 - dl.acm.org

When building large-scale machine learning (ML) programs, such as massive topic models
or deep neural networks with up to trillions of parameters and training examples, one usually …

被引用次数：242 相关文章所有 10 个版本

高级搜索

QQ 群

[HTML][HTML] Strategies and principles of distributed machine learning on big data

Orchestrating the development lifecycle of machine learning-based IoT applications: A taxonomy and survey

Gpipe: Efficient training of giant neural networks using pipeline parallelism

Pipedream: Fast and efficient pipeline parallel dnn training

Petuum: A new platform for distributed machine learning on big data

FDML: A collaborative machine learning framework for distributed features

Pyramid: Enabling hierarchical neural networks with edge computing

{HetPipe}: Enabling large {DNN} training on (whimpy) heterogeneous {GPU} clusters through integration of pipelined model parallelism and data parallelism

SiP-ML: high-bandwidth optical network interconnects for machine learning training

Lightlda: Big topic models on modest computer clusters

引用