Clash of the titans: Mapreduce vs. spark for large scale data analytics

J Shi, Y Qiu, UF Minhas, L Jiao, C Wang… - Proceedings of the …, 2015 - dl.acm.org
MapReduce and Spark are two very popular open source cluster computing frameworks for
large scale data analytics. These frameworks hide the complexity of task parallelism and …

Large scale distributed data science using apache spark

JG Shanahan, L Dai - Proceedings of the 21th ACM SIGKDD …, 2015 - dl.acm.org
Apache Spark is an open-source cluster computing framework for big data processing. It has
emerged as the next generation big data processing engine, overtaking Hadoop …

Map-join-reduce: Toward scalable and efficient data analysis on large clusters

D Jiang, AKH Tung, G Chen - IEEE transactions on knowledge …, 2010 - ieeexplore.ieee.org
Data analysis is an important functionality in cloud computing which allows a huge amount
of data to be processed over very large clusters. MapReduce is recognized as a popular …

[图书][B] An architecture for fast and general data processing on large clusters

M Zaharia - 2016 - books.google.com
The past few years have seen a major change in computing systems, as growing data
volumes and stalling processor speeds require more and more applications to scale out to …

Exploring mapreduce efficiency with highly-distributed data

M Cardosa, C Wang, A Nangia, A Chandra… - Proceedings of the …, 2011 - dl.acm.org
MapReduce is a highly-popular paradigm for high-performance computing over large data
sets in large-scale platforms. However, when the source data is widely distributed and the …

MapReduce: simplified data processing on large clusters

J Dean, S Ghemawat - Communications of the ACM, 2008 - dl.acm.org
MapReduce is a programming model and an associated implementation for processing and
generating large datasets that is amenable to a broad variety of real-world tasks. Users …

The family of mapreduce and large-scale data processing systems

S Sakr, A Liu, AG Fayoumi - ACM Computing Surveys (CSUR), 2013 - dl.acm.org
In the last two decades, the continuous increase of computational power has produced an
overwhelming flow of data which has called for a paradigm shift in the computing …

Parallel data processing with MapReduce: a survey

KH Lee, YJ Lee, H Choi, YD Chung, B Moon - AcM sIGMoD record, 2012 - dl.acm.org
A prominent parallel data processing tool MapReduce is gaining significant momentum from
both industry and academia as the volume of data to analyze grows rapidly. While …

Themis: an i/o-efficient mapreduce

A Rasmussen, VT Lam, M Conley, G Porter… - Proceedings of the …, 2012 - dl.acm.org
" Big Data" computing increasingly utilizes the MapReduce programming model for scalable
processing of large data collections. Many MapReduce jobs are I/O-bound, and so …

[PDF][PDF] Spark: Cluster computing with working sets

M Zaharia, M Chowdhury, MJ Franklin… - 2nd USENIX workshop …, 2010 - usenix.org
MapReduce and its variants have been highly successful in implementing large-scale data-
intensive applications on commodity clusters. However, most of these systems are built …