Scarlett: coping with skewed content popularity in mapreduce clusters

G Ananthanarayanan, S Agarwal, S Kandula… - Proceedings of the sixth …, 2011 - dl.acm.org
To improve data availability and resilience MapReduce frameworks use file systems that
replicate data uniformly. However, analysis of job logs from a large production cluster shows …

MapReduce optimization using regulated dynamic prioritization

T Sandholm, K Lai - Proceedings of the eleventh international joint …, 2009 - dl.acm.org
We present a system for allocating resources in shared data and compute clusters that
improves MapReduce job scheduling in three ways. First, the system uses regulated and …

[PDF][PDF] Chukwa: a system for reliable {Large-Scale} log collection

A Rabkin, R Katz - 24th Large Installation System Administration …, 2010 - usenix.org
Large Internet services companies like Google, Yahoo, and Facebook use the MapReduce
programming model to process log data. MapReduce is designed to work on data stored in …

Dynamicmr: A dynamic slot allocation optimization framework for mapreduce clusters

S Tang, BS Lee, B He - IEEE Transactions on Cloud …, 2014 - ieeexplore.ieee.org
MapReduce is a popular computing paradigm for large-scale data processing in cloud
computing. However, the slot-based MapReduce system (eg, Hadoop MRv1) can suffer from …

Aria: automatic resource inference and allocation for mapreduce environments

A Verma, L Cherkasova, RH Campbell - Proceedings of the 8th ACM …, 2011 - dl.acm.org
MapReduce and Hadoop represent an economically compelling alternative for efficient
large scale data processing and advanced analytics in the enterprise. A key challenge in …

Exploring mapreduce efficiency with highly-distributed data

M Cardosa, C Wang, A Nangia, A Chandra… - Proceedings of the …, 2011 - dl.acm.org
MapReduce is a highly-popular paradigm for high-performance computing over large data
sets in large-scale platforms. However, when the source data is widely distributed and the …

Trojan data layouts: right shoes for a running elephant

A Jindal, JA Quiané-Ruiz, J Dittrich - … of the 2nd ACM Symposium on …, 2011 - dl.acm.org
MapReduce is becoming ubiquitous in large-scale data analysis. Several recent works have
shown that the performance of Hadoop MapReduce could be improved, for instance, by …

Sailfish: A framework for large scale data processing

S Rao, R Ramakrishnan, A Silberstein… - Proceedings of the …, 2012 - dl.acm.org
In this paper, we present Sailfish, a new Map-Reduce framework for large scale data
processing. The Sailfish design is centered around aggregating intermediate data …

Improving MapReduce performance using smart speculative execution strategy

Q Chen, C Liu, Z Xiao - IEEE Transactions on Computers, 2013 - ieeexplore.ieee.org
MapReduce is a widely used parallel computing framework for large scale data processing.
The two major performance metrics in MapReduce are job execution time and cluster …

[PDF][PDF] Improving MapReduce performance in heterogeneous environments.

M Zaharia, A Konwinski, AD Joseph, RH Katz, I Stoica - Osdi, 2008 - usenix.org
MapReduce is emerging as an important programming model for large-scale data-parallel
applications such as web indexing, data mining, and scientific simulation. Hadoop is an …