Distributed data management using MapReduce

F Li, BC Ooi, MT Özsu, S Wu - ACM Computing Surveys (CSUR), 2014 - dl.acm.org
MapReduce is a framework for processing and managing large-scale datasets in a
distributed cluster, which has been used for applications such as generating search indexes …

A survey of large-scale analytical query processing in MapReduce

C Doulkeridis, K Nørvåg - The VLDB journal, 2014 - Springer
Enterprises today acquire vast volumes of data from different sources and leverage this
information by means of data analysis to support effective decision-making and provide new …

ishuffle: Improving hadoop performance with shuffle-on-write

Y Guo, J Rao, D Cheng, X Zhou - IEEE transactions on parallel …, 2016 - ieeexplore.ieee.org
Hadoop is a popular implementation of the MapReduce framework for running data-
intensive jobs on clusters of commodity servers. Shuffle, the all-to-all input data fetching …

Using input command pre-shaping to suppress multiple mode vibration

JM Hyde, WP Seering - MIT Space Engineering Research Center, 1990 - ntrs.nasa.gov
Spacecraft, space-borne robotic systems, and manufacturing equipment often utilize
lightweight materials and configurations that give rise to vibration problems. Prior research …

CRESP: Towards optimal resource provisioning for MapReduce computing in public clouds

K Chen, J Powers, S Guo, F Tian - IEEE Transactions on …, 2013 - ieeexplore.ieee.org
Running MapReduce programs in the cloud introduces this unique problem: how to optimize
resource provisioning to minimize the monetary cost or job finish time for a specific job? We …

An intermediate data placement algorithm for load balancing in spark computing environment

Z Tang, X Zhang, K Li, K Li - Future Generation Computer Systems, 2018 - Elsevier
Since MapReduce became an effective and popular programming framework for parallel
data processing, key skew in intermediate data has become one of the important system …

Mapreduce data skewness handling: a systematic literature review

MA Irandoost, AM Rahmani, S Setayeshi - International Journal of Parallel …, 2019 - Springer
One of the most successful techniques in large-scale data-intensive computations is
MapReduce programming. MapReduce is based on a divide and conquer approach that …

Online load balancing for mapreduce with skewed data input

Y Le, J Liu, F Ergün, D Wang - IEEE INFOCOM 2014-IEEE …, 2014 - ieeexplore.ieee.org
MapReduce has emerged as a powerful tool for distributed and scalable processing of
voluminous data. In this paper, we, for the first time, examine the problem of accommodating …

[PDF][PDF] Managing Skew in Hadoop.

YC Kwon, K Ren, M Balazinska, B Howe, J Rolia - IEEE Data Eng. Bull., 2013 - cs.cmu.edu
Abstract Challenges in Big Data analytics stem not only from volume, but also variety:
extreme diversity in both data types (eg, text, images, and graphs) and in operations beyond …

Rock you like a hurricane: Taming skew in large scale analytics

L Bindschaedler, J Malicevic, N Schiper… - Proceedings of the …, 2018 - dl.acm.org
Current cluster computing frameworks suffer from load imbalance and limited parallelism
due to skewed data distributions, processing times, and machine speeds. We observe that …