Balancing reducer skew in MapReduce workloads using progressive sampling

F Li, BC Ooi, MT Özsu, S Wu - ACM Computing Surveys (CSUR), 2014 - dl.acm.org

MapReduce is a framework for processing and managing large-scale datasets in a
distributed cluster, which has been used for applications such as generating search indexes …

被引用次数：248 相关文章所有 15 个版本

[PDF] princeton.edu

A survey of large-scale analytical query processing in MapReduce

C Doulkeridis, K Nørvåg - The VLDB journal, 2014 - Springer

Enterprises today acquire vast volumes of data from different sources and leverage this
information by means of data analysis to support effective decision-making and provide new …

被引用次数：339 相关文章所有 15 个版本

[PDF] usenix.org

ishuffle: Improving hadoop performance with shuffle-on-write

Y Guo, J Rao, D Cheng, X Zhou - IEEE transactions on parallel …, 2016 - ieeexplore.ieee.org

Hadoop is a popular implementation of the MapReduce framework for running data-
intensive jobs on clusters of commodity servers. Shuffle, the all-to-all input data fetching …

被引用次数：193 相关文章所有 10 个版本

[PDF] nasa.gov

Using input command pre-shaping to suppress multiple mode vibration

JM Hyde, WP Seering - MIT Space Engineering Research Center, 1990 - ntrs.nasa.gov

Spacecraft, space-borne robotic systems, and manufacturing equipment often utilize
lightweight materials and configurations that give rise to vibration problems. Prior research …

被引用次数：274 相关文章所有 8 个版本

[PDF] academia.edu

CRESP: Towards optimal resource provisioning for MapReduce computing in public clouds

K Chen, J Powers, S Guo, F Tian - IEEE Transactions on …, 2013 - ieeexplore.ieee.org

Running MapReduce programs in the cloud introduces this unique problem: how to optimize
resource provisioning to minimize the monetary cost or job finish time for a specific job? We …

被引用次数：113 相关文章所有 9 个版本

[PDF] newpaltz.edu

An intermediate data placement algorithm for load balancing in spark computing environment

Z Tang, X Zhang, K Li, K Li - Future Generation Computer Systems, 2018 - Elsevier

Since MapReduce became an effective and popular programming framework for parallel
data processing, key skew in intermediate data has become one of the important system …

被引用次数：82 相关文章所有 2 个版本

Mapreduce data skewness handling: a systematic literature review

MA Irandoost, AM Rahmani, S Setayeshi - International Journal of Parallel …, 2019 - Springer

One of the most successful techniques in large-scale data-intensive computations is
MapReduce programming. MapReduce is based on a divide and conquer approach that …

被引用次数：16 相关文章所有 4 个版本

[PDF] psu.edu

Online load balancing for mapreduce with skewed data input

Y Le, J Liu, F Ergün, D Wang - IEEE INFOCOM 2014-IEEE …, 2014 - ieeexplore.ieee.org

MapReduce has emerged as a powerful tool for distributed and scalable processing of
voluminous data. In this paper, we, for the first time, examine the problem of accommodating …

被引用次数：73 相关文章所有 12 个版本

[PDF] cmu.edu

[PDF][PDF] Managing Skew in Hadoop.

YC Kwon, K Ren, M Balazinska, B Howe, J Rolia - IEEE Data Eng. Bull., 2013 - cs.cmu.edu

Abstract Challenges in Big Data analytics stem not only from volume, but also variety:
extreme diversity in both data types (eg, text, images, and graphs) and in operations beyond …

被引用次数：73 相关文章所有 6 个版本

[PDF] epfl.ch

Rock you like a hurricane: Taming skew in large scale analytics

L Bindschaedler, J Malicevic, N Schiper… - Proceedings of the …, 2018 - dl.acm.org

Current cluster computing frameworks suffer from load imbalance and limited parallelism
due to skewed data distributions, processing times, and machine speeds. We observe that …

被引用次数：40 相关文章所有 12 个版本

高级搜索

QQ 群