A data skew oriented reduce placement algorithm based on sampling

Z Tang, W Ma, K Li, K Li - IEEE Transactions on Cloud …, 2016 - ieeexplore.ieee.org
For frequent disk I/O and large data transmissions among different racks and physical
nodes, intermediate data communication has become the most important performance bottle …

Draw: a new data-grouping-aware data placement scheme for data intensive applications with interest locality

J Wang, P Shang, J Yin - Cloud Computing for Data-Intensive Applications, 2014 - Springer
Recent years have seen an increasing number of scientists employ data parallel computing
frameworks such as MapReduce and Hadoop to run data intensive applications and …

A counter-based profiling scheme for improving locality through data and reducer placement

MW Hussain, DS Roy - Advances in Machine Learning for Big Data …, 2022 - Springer
Hadoop has been regarded as the de-facto standard for handling data-intensive distributed
applications with its popular storage and processing engine called as the Hadoop …

Optimizing data placement in heterogeneous Hadoop clusters

R Xiong, J Luo, F Dong - Cluster Computing, 2015 - Springer
Data placement decision of Hadoop distributed file system (HDFS) is very important for the
data locality which is a primary criterion for task scheduling of MapReduce model and …

HaDaap: a hotness‐aware data placement strategy for improving storage efficiency in heterogeneous Hadoop clusters

R Xiong, Y Du, J Jin, J Luo - Concurrency and Computation …, 2018 - Wiley Online Library
Enterprises increasingly use the Hadoop Distributed File System (HDFS) to manage and
store big data for many applications. However, HDFS uses triple replication, leading to …

A new data-grouping-aware dynamic data placement method that take into account jobs execute frequency for Hadoop

J Wu, C Zhang, B Zhang, P Wang - Microprocessors and Microsystems, 2016 - Elsevier
Recent years have seen an increasing number of scientists employing data parallel
computing frameworks, such as Hadoop, in order to run data-intensive applications …

An optimal data placement strategy for improving system performance of massive data applications using graph clustering

S Vengadeswaran, SR Balasundaram - International Journal of …, 2018 - igi-global.com
This article describes how the time taken to execute a query and return the results, increase
exponentially as the data size increases, leading to more waiting times of the user. Hadoop …

Survey of data locality in apache hadoop

S Lee, JY Jo, Y Kim - … on Big Data, Cloud Computing, Data …, 2019 - ieeexplore.ieee.org
One of the key challenges in big data technology is the velocity at which the data is
processed. Hadoop, an open-source software framework, is the dominant technology to …

[HTML][HTML] IDaPS—Improved data-locality aware data placement strategy based on Markov clustering to enhance MapReduce performance on Hadoop

S Vengadeswaran, SR Balasundaram… - Journal of King Saud …, 2024 - Elsevier
Abstract The execution of Map-Reduce applications on the Hadoop cluster poses significant
challenges due to the non-consideration of data locality, ie, assigning tasks to compute …

Novel data‐placement scheme for improving the data locality of Hadoop in heterogeneous environments

M Bae, S Yeo, G Park, S Oh - Concurrency and Computation …, 2021 - Wiley Online Library
To address the challenging needs of high‐performance big data processing, parallel‐
distributed frameworks such as Hadoop are being utilized extensively. However, in …