Custody: Towards data-aware resource sharing in cloud-based big data processing- 学术资源搜索

Custody: Towards data-aware resource sharing in cloud-based big data processing

S Ma, J Jiang, B Li, B Li - 2016 IEEE International Conference …, 2016 - ieeexplore.ieee.org

2016 IEEE International Conference on Cluster Computing (CLUSTER), 2016•ieeexplore.ieee.org

With the advent of big data processing frameworks, the performance of data-parallel applications is heavily affected by the time it takes to read input data, making it important to improve data locality. Existing methods in achieving data locality have primarily focused on selecting machines to place tasks of applications. Nevertheless, the set of machines that an application can choose from is determined by a cluster manager, which is oblivious to the location of data in existing resource sharing frameworks. In this paper, we design, implement and evaluate Custody, a new cluster management framework that helps to maximize data locality by allocating the executor processes with local access to data to those applications in need. Custody achieves this objective by dynamically collecting runtime information of an application's input data and by effectively allocating executors among and within applications through theoretic analyses of the data-aware resource sharing problem. With significantly better data locality, Custody avoids unnecessary network transfers and thus expedites job completion times. Our experimental results on a 100-node cluster demonstrate that Custody can improve the data locality for input tasks by 36.9% in comparison with Spark's default cluster manager. Meanwhile, it reduces the job completion times by 14.9% due to fewer network transfers.

ieeexplore.ieee.org

展开收起

被引用次数：4 相关文章所有 5 个版本

以上显示的是最相近的搜索结果。查看全部搜索结果

高级搜索

QQ 群

Custody: Towards data-aware resource sharing in cloud-based big data processing

引用