[PDF][PDF] Reining in the outliers in {Map-Reduce} clusters using mantri

G Ananthanarayanan, S Kandula… - … USENIX Symposium on …, 2010 - usenix.org
Experience from an operational Map-Reduce cluster reveals that outliers significantly
prolong job completion. e causes for outliers include run-time contention for processor …

Balancing reducer skew in MapReduce workloads using progressive sampling

SR Ramakrishnan, G Swart, A Urmanov - Proceedings of the Third ACM …, 2012 - dl.acm.org
The elapsed time of a parallel job depends on the completion time of its longest running
constituent. We present a static load balancing algorithm that distributes work evenly across …

Breaking the MapReduce stage barrier

A Verma, B Cho, N Zea, I Gupta, RH Campbell - Cluster computing, 2013 - Springer
The MapReduce model uses a barrier between the Map and Reduce stages. This provides
simplicity in both programming and implementation. However, in many situations, this barrier …

Clash of the titans: Mapreduce vs. spark for large scale data analytics

J Shi, Y Qiu, UF Minhas, L Jiao, C Wang… - Proceedings of the …, 2015 - dl.acm.org
MapReduce and Spark are two very popular open source cluster computing frameworks for
large scale data analytics. These frameworks hide the complexity of task parallelism and …

Adaptive MapReduce using situation-aware mappers

R Vernica, A Balmin, KS Beyer… - Proceedings of the 15th …, 2012 - dl.acm.org
We propose new adaptive runtime techniques for MapReduce that improve performance
and simplify job tuning. We implement these techniques by breaking a key assumption of …

MapReduce optimization using regulated dynamic prioritization

T Sandholm, K Lai - Proceedings of the eleventh international joint …, 2009 - dl.acm.org
We present a system for allocating resources in shared data and compute clusters that
improves MapReduce job scheduling in three ways. First, the system uses regulated and …

Scarlett: coping with skewed content popularity in mapreduce clusters

G Ananthanarayanan, S Agarwal, S Kandula… - Proceedings of the sixth …, 2011 - dl.acm.org
To improve data availability and resilience MapReduce frameworks use file systems that
replicate data uniformly. However, analysis of job logs from a large production cluster shows …

Delay tails in MapReduce scheduling

J Tan, X Meng, L Zhang - Proceedings of the 12th ACM SIGMETRICS …, 2012 - dl.acm.org
MapReduce/Hadoop production clusters exhibit heavy-tailed characteristics for job
processing times. These phenomena are resultant of the workload features and the adopted …

Aria: automatic resource inference and allocation for mapreduce environments

A Verma, L Cherkasova, RH Campbell - Proceedings of the 8th ACM …, 2011 - dl.acm.org
MapReduce and Hadoop represent an economically compelling alternative for efficient
large scale data processing and advanced analytics in the enterprise. A key challenge in …

MapReduce: simplified data processing on large clusters

J Dean, S Ghemawat - Communications of the ACM, 2008 - dl.acm.org
MapReduce is a programming model and an associated implementation for processing and
generating large datasets that is amenable to a broad variety of real-world tasks. Users …