Cluster frameworks for efficient scheduling and resource allocation in data center networks: A survey

K Wang, Q Zhou, S Guo, J Luo - IEEE Communications Surveys …, 2018 - ieeexplore.ieee.org
Data centers are widely used for big data analytics, which often involve data-parallel jobs,
including query and web service. Meanwhile, cluster frameworks are rapidly developed for …

Online job scheduling in distributed machine learning clusters

Y Bao, Y Peng, C Wu, Z Li - IEEE INFOCOM 2018-IEEE …, 2018 - ieeexplore.ieee.org
Nowadays large-scale distributed machine learning systems have been deployed to support
various analytics and intelligence services in IT firms. To train a large dataset and derive the …

[PDF][PDF] Deepweave: Accelerating job completion time with deep reinforcement learning-based coflow scheduling

P Sun, Z Guo, J Wang, J Li, J Lan, Y Hu - Proceedings of the Twenty-Ninth …, 2021 - ijcai.org
To improve the processing efficiency of jobs in distributed computing, the concept of coflow
is proposed. A coflow is a collection of flows that are semantically correlated in a multi-stage …

Efficient online coflow routing and scheduling

Y Li, SHC Jiang, H Tan, C Zhang, G Chen… - Proceedings of the 17th …, 2016 - dl.acm.org
A coflow is a collection of related parallel flows that occur typically between two stages of a
multi-stage compute task in a network, such as shuffle flows in MapReduce. The coflow …

[PDF][PDF] 数据中心网络流量调度的研究进展与趋势

李文信, 齐恒, 徐仁海, 周晓波, 李克秋 - CHINESE JOURNAL OF …, 2020 - cjc.ict.ac.cn
摘要近年来, 流量调度已经发展成为网络领域的热点研究问题. 该问题主要决定何时以及以多大
速率传输网络中的每条数据流, 其对网络性能和应用性能都具有十分重要的影响. 然而 …

An improved bound for minimizing the total weighted completion time of coflows in datacenters

M Shafiee, J Ghaderi - IEEE/ACM Transactions on Networking, 2018 - ieeexplore.ieee.org
In data-parallel computing frameworks, intermediate parallel data is often produced at
various stages which needs to be transferred among servers in the datacenter network (eg …

Joint online coflow routing and scheduling in data center networks

H Tan, SHC Jiang, Y Li, XY Li, C Zhang… - IEEE/ACM …, 2019 - ieeexplore.ieee.org
A coflow is a collection of related parallel flows that occur typically between two stages of a
multi-stage computing task in a network, such as shuffle flows in MapReduce. The coflow …

Efficient scheduling of weighted coflows in data centers

Z Wang, H Zhang, X Shi, X Yin, Y Li… - … on Parallel and …, 2019 - ieeexplore.ieee.org
Traditional network resource management mechanisms are mainly flow or packet based.
Recently, coflow has been proposed as a new abstraction to capture the communication …

Efficient online scheduling for coflow-aware machine learning clusters

W Li, S Chen, K Li, H Qi, R Xu… - IEEE Transactions on …, 2020 - ieeexplore.ieee.org
Distributed machine learning (DML) is an increasingly important workload. In a DML job,
each communication phase can comprise a coflow, and there are dependencies among its …

Online dispatching and scheduling of jobs with heterogeneous utilities in edge computing

C Zhang, H Tan, H Huang, Z Han, SHC Jiang… - Proceedings of the …, 2020 - dl.acm.org
Edge computing systems typically handle a wide variety of applications that exhibit diverse
degrees of sensitivity to job latency. Therefore, a multitude of utility functions of the job …