作者
Hilmi Egemen Ciritoglu, Leandro Batista de Almeida, Eduardo Cunha de Almeida, Teodora Sandra Buda, John Murphy, Christina Thorpe
发表日期
2018/4/2
图书
Companion of the 2018 ACM/SPEC International Conference on Performance Engineering
页码范围
135-140
简介
The massive growth in the volume of data and the demand for big data utilisation has led to an increasing prevalence of Hadoop Distributed File System (HDFS) solutions. However, the performance of Hadoop and indeed HDFS has some limitations and remains an open problem in the research community. The ultimate goal of our research is to develop an adaptive replication system; this paper presents the first phase of the work - an investigation into the replication factor used in HDFS to determine whether increasing the replication factor for in-demand data can improve the performance of the system. We constructed a physical Hadoop cluster for our experimental environment, using TestDFSIO and both the real world and the synthetic data sets, NOAA and TPC-H, with Hive to validate our proposal. Results show that increasing the replication factor of the »hot» data increases the availability and locality of the …
引用总数
201820192020202120222023324223
学术搜索中的文章