[图书][B] An architecture for fast and general data processing on large clusters

M Zaharia - 2016 - books.google.com
The past few years have seen a major change in computing systems, as growing data
volumes and stalling processor speeds require more and more applications to scale out to …

[PDF][PDF] Spark: Cluster computing with working sets

M Zaharia, M Chowdhury, MJ Franklin… - 2nd USENIX workshop …, 2010 - usenix.org
MapReduce and its variants have been highly successful in implementing large-scale data-
intensive applications on commodity clusters. However, most of these systems are built …

Clash of the titans: Mapreduce vs. spark for large scale data analytics

J Shi, Y Qiu, UF Minhas, L Jiao, C Wang… - Proceedings of the …, 2015 - dl.acm.org
MapReduce and Spark are two very popular open source cluster computing frameworks for
large scale data analytics. These frameworks hide the complexity of task parallelism and …

Large scale distributed data science using apache spark

JG Shanahan, L Dai - Proceedings of the 21th ACM SIGKDD …, 2015 - dl.acm.org
Apache Spark is an open-source cluster computing framework for big data processing. It has
emerged as the next generation big data processing engine, overtaking Hadoop …

SCOPE: parallel databases meet MapReduce

J Zhou, N Bruno, MC Wu, PA Larson, R Chaiken… - The VLDB Journal, 2012 - Springer
Companies providing cloud-scale data services have increasing needs to store and analyze
massive data sets, such as search logs, click streams, and web graph data. For cost and …

MapReduce: simplified data processing on large clusters

J Dean, S Ghemawat - Communications of the ACM, 2008 - dl.acm.org
MapReduce is a programming model and an associated implementation for processing and
generating large datasets that is amenable to a broad variety of real-world tasks. Users …

Disco: a computing platform for large-scale data analytics

P Mundkur, V Tuulos, J Flatow - Proceedings of the 10th ACM SIGPLAN …, 2011 - dl.acm.org
We describe the design and implementation of Disco, a distributed computing platform for
MapReduce style computations on large-scale data. Disco is designed for operation in …

Distributed data management using MapReduce

F Li, BC Ooi, MT Özsu, S Wu - ACM Computing Surveys (CSUR), 2014 - dl.acm.org
MapReduce is a framework for processing and managing large-scale datasets in a
distributed cluster, which has been used for applications such as generating search indexes …

Muppet: Mapreduce-style processing of fast data

W Lam, L Liu, STS Prasad, A Rajaraman… - arXiv preprint arXiv …, 2012 - arxiv.org
MapReduce has emerged as a popular method to process big data. In the past few years,
however, not just big data, but fast data has also exploded in volume and availability …

Clydesdale: structured data processing on MapReduce

T Kaldewey, EJ Shekita, S Tata - … of the 15th international conference on …, 2012 - dl.acm.org
MapReduce has emerged as a promising architecture for large scale data analytics on
commodity clusters. The rapid adoption of Hive, a SQL-like data processing language on …