Distributed data management using MapReduce

F Li, BC Ooi, MT Özsu, S Wu - ACM Computing Surveys (CSUR), 2014 - dl.acm.org
MapReduce is a framework for processing and managing large-scale datasets in a
distributed cluster, which has been used for applications such as generating search indexes …

A comprehensive view of Hadoop research—A systematic literature review

I Polato, R Ré, A Goldman, F Kon - Journal of Network and Computer …, 2014 - Elsevier
Context: In recent years, the valuable knowledge that can be retrieved from petabyte scale
datasets–known as Big Data–led to the development of solutions to process information …

[PDF][PDF] 支持大数据管理的NoSQL 系统研究综述

申德荣, 于戈, 王习特, 聂铁铮, 寇月 - 软件学报, 2013 - jos.org.cn
针对大数据管理的新需求, 呈现出了许多面向特定应用的NoSQL 数据库系统. 针对基于key-
value 数据模型的NoSQL 数据库的相关研究进行综述. 首先, 介绍了大数据的特点以及支持大 …

Making sense of performance in data analytics frameworks

K Ousterhout, R Rasti, S Ratnasamy… - … USENIX Symposium on …, 2015 - usenix.org
There has been much research devoted to improving the performance of data analytics
frameworks, but comparatively little effort has been spent systematically identifying the …

Sprocket: A serverless video processing framework

L Ao, L Izhikevich, GM Voelker, G Porter - Proceedings of the ACM …, 2018 - dl.acm.org
Sprocket is a highly configurable, stage-based, scalable, serverless video processing
framework that exploits intra-video parallelism to achieve low latency. Sprocket enables …

[图书][B] Magellan: Toward building entity matching management systems

PV Konda - 2018 - search.proquest.com
Entity matching (EM) identifies data instances that refer to the same real-world entity, such
as (David Smith, UWMadison) and (DM Smith, UWM). This problem has been a long …

Neural acceleration for general-purpose approximate programs

H Esmaeilzadeh, A Sampson, L Ceze… - 2012 45th annual …, 2012 - ieeexplore.ieee.org
This paper describes a learning-based approach to the acceleration of approximate
programs. We describe the Parrot transformation, a program transformation that selects and …

Shark: SQL and rich analytics at scale

RS Xin, J Rosen, M Zaharia, MJ Franklin… - Proceedings of the …, 2013 - dl.acm.org
Shark is a new data analysis system that marries query processing with complex analytics
on large clusters. It leverages a novel distributed memory abstraction to provide a unified …

Communication steps for parallel query processing

P Beame, P Koutris, D Suciu - Journal of the ACM (JACM), 2017 - dl.acm.org
We study the problem of computing conjunctive queries over large databases on parallel
architectures without shared storage. Using the structure of such a query q and the skew in …

Locationspark: A distributed in-memory data management system for big spatial data

M Tang, Y Yu, QM Malluhi, M Ouzzani… - Proceedings of the VLDB …, 2016 - dl.acm.org
We present LocationSpark, a spatial data processing system built on top of Apache Spark, a
widely used distributed data processing system. LocationSpark offers a rich set of spatial …