Survey of data locality in apache hadoop

S Lee, JY Jo, Y Kim - … on Big Data, Cloud Computing, Data …, 2019 - ieeexplore.ieee.org
One of the key challenges in big data technology is the velocity at which the data is
processed. Hadoop, an open-source software framework, is the dominant technology to …

[HTML][HTML] Hadoop performance analysis model with deep data locality

S Lee, JY Jo, Y Kim - Information, 2019 - mdpi.com
Background: Hadoop has become the base framework on the big data system via the simple
concept that moving computation is cheaper than moving data. Hadoop increases a data …

A review on data locality in hadoop MapReduce

A Sharma, G Singh - 2018 Fifth International Conference on …, 2018 - ieeexplore.ieee.org
MapReduce has emerged as a strong model for processing parallel and distributed data for
huge datasets. Hadoop an open source implementation of MapReduce has approved …

Dependency-aware data locality for MapReduce

X Ma, X Fan, J Liu, D Li - IEEE Transactions on Cloud …, 2015 - ieeexplore.ieee.org
MapReduce effectively partitions and distributes computation workloads to a cluster of
servers, facilitating today's big data processing. Given the massive data to be dispatched …

Performance improvement of mapreduce process by promoting deep data locality

S Lee, JY Jo, Y Kim - … on data science and advanced analytics …, 2016 - ieeexplore.ieee.org
MapReduce has been widely used in many data science applications. It has been observed
that an excessive data transfer has a negative impact on its performance. To reduce the …

Optimizing data placement in heterogeneous Hadoop clusters

R Xiong, J Luo, F Dong - Cluster Computing, 2015 - Springer
Data placement decision of Hadoop distributed file system (HDFS) is very important for the
data locality which is a primary criterion for task scheduling of MapReduce model and …

Draw: a new data-grouping-aware data placement scheme for data intensive applications with interest locality

J Wang, P Shang, J Yin - Cloud Computing for Data-Intensive Applications, 2014 - Springer
Recent years have seen an increasing number of scientists employ data parallel computing
frameworks such as MapReduce and Hadoop to run data intensive applications and …

[HTML][HTML] IDaPS—Improved data-locality aware data placement strategy based on Markov clustering to enhance MapReduce performance on Hadoop

S Vengadeswaran, SR Balasundaram… - Journal of King Saud …, 2024 - Elsevier
Abstract The execution of Map-Reduce applications on the Hadoop cluster poses significant
challenges due to the non-consideration of data locality, ie, assigning tasks to compute …

Improving Performance of Hadoop Clusters

J Xie - 2011 - search.proquest.com
The MapReduce model has become an important parallel processing model for large-scale
data-intensive applications like data mining and web indexing. Hadoop, an open-source …

vLocality: Revisiting data locality for MapReduce in virtualized clouds

X Ma, X Fan, J Liu, H Jiang, K Peng - IEEE Network, 2016 - ieeexplore.ieee.org
Recent years have witnessed a surge of new generation applications involving big data. The
de facto framework for big data processing, MapReduce, has been increasingly embraced …