[PDF][PDF] Towards distributed machine learning in shared clusters: A dynamically-partitioned approach.(2017)

P SUN, Y WEN, NBD TA, S YAN - Proceedings of the 2017 IEEE … - ink.library.smu.edu.sg
Many cluster management systems (CMSs) have been proposed to share a single cluster
with multiple distributed computing systems. However, none of the existing approaches can …

Towards distributed machine learning in shared clusters: A dynamically-partitioned approach

P Sun, Y Wen, NBD Ta, S Yan - 2017 IEEE International …, 2017 - ieeexplore.ieee.org
Many cluster management systems (CMSs) have been proposed to share a single cluster
with multiple distributed computing systems. However, none of the existing approaches can …

Baileys: An Efficient Distributed Machine Learning Framework by Dynamic Grouping

C Ni, H Du - Proceedings of the 2023 15th International Conference …, 2023 - dl.acm.org
Many machine-learning applications rely on distributed machine learning (DML) systems to
train models from massive datasets using massive computing resources (eg, GPUs and …

Harmony: A scheduling framework optimized for multiple distributed machine learning jobs

WY Lee, Y Lee, WW Song, Y Yang… - 2021 IEEE 41st …, 2021 - ieeexplore.ieee.org
We introduce Harmony, a new scheduling framework that executes multiple Parameter-
Server ML training jobs together to improve cluster resource utilization. Harmony …

Rankmap: A platform-aware framework for distributed learning from dense datasets

A Mirhoseini, EL Dyer, E Songhori, RG Baraniuk… - arXiv preprint arXiv …, 2015 - arxiv.org
This paper introduces RankMap, a platform-aware end-to-end framework for efficient
execution of a broad class of iterative learning algorithms for massive and dense datasets …

[PDF][PDF] Scheduling Techniques in Resource Shared Large-Scale Clusters

E Hwang - 2019 - scholarworks.unist.ac.kr
To support various types of applications submitted by multiple users, a large-scale cluster
composed of different types of computing platforms, such as supercomputers, grids, and …

Toward Scalable Distributed Machine Learning on Data-Parallel Clusters

S Wang - 2020 - search.proquest.com
The rise of BigData leads to demand for machine learning (ML) for training complex models
on a huge volume of input data. Thus, distributed ML is getting prevalent in both academia …

GreedW: A Flexible and Efficient Decentralized Framework for Distributed Machine Learning

T Wang, X Jiang, Q Li, H Cai - IEEE Transactions on Computers, 2023 - ieeexplore.ieee.org
With the ever-increasing demand for computing power in deep learning, distributed training
techniques have proven to be effective in meeting these demands. However, current existing …

Frugal Decentralized Learning

AM Kermarrec - 2022 IEEE International Parallel and …, 2022 - ieeexplore.ieee.org
Machine learning is currently shifting from a centralized paradigm to decentralized ones
where machine learning models are trained collaboratively. In fully decentralized learning …

Semi-dynamic load balancing: Efficient distributed learning in non-dedicated environments

C Chen, Q Weng, W Wang, B Li, B Li - … of the 11th ACM Symposium on …, 2020 - dl.acm.org
Machine learning (ML) models are increasingly trained in clusters with non-dedicated
workers possessing heterogeneous resources. In such scenarios, model training efficiency …