作者
Hilmi Egemen Ciritoglu, Takfarinas Saber, Teodora Sandra Buda, John Murphy, Christina Thorpe
发表日期
2018/7/2
研讨会论文
2018 IEEE International Congress on Big Data (BigData Congress)
页码范围
104-111
出版商
IEEE
简介
The Hadoop Distributed File System (HDFS) is the storage of choice when it comes to large-scale distributed systems. In addition to being efficient and scalable, HDFS provides high throughput and reliability through the replication of data. Recent work exploits this replication feature by dynamically varying the replication factor of in-demand data as a means of increasing data locality and achieving a performance improvement. However, to the best of our knowledge, no study has been performed on the consequences of varying the replication factor. In particular, our work is the first to show that although HDFS deals well with increasing the replication factor, it experiences problems with decreasing it. This leads to unbalanced data, hot spots, and performance degradation. In order to address this problem, we propose a new workload-aware balanced replica deletion algorithm. We also show that our algorithm …
引用总数
201820192020202120222023254562
学术搜索中的文章
HE Ciritoglu, T Saber, TS Buda, J Murphy, C Thorpe - 2018 IEEE International Congress on Big Data …, 2018