Context: In recent years, the valuable knowledge that can be retrieved from petabyte scale datasets–known as Big Data–led to the development of solutions to process information …
There has been much research devoted to improving the performance of data analytics frameworks, but comparatively little effort has been spent systematically identifying the …
Sprocket is a highly configurable, stage-based, scalable, serverless video processing framework that exploits intra-video parallelism to achieve low latency. Sprocket enables …
Entity matching (EM) identifies data instances that refer to the same real-world entity, such as (David Smith, UWMadison) and (DM Smith, UWM). This problem has been a long …
This paper describes a learning-based approach to the acceleration of approximate programs. We describe the Parrot transformation, a program transformation that selects and …
Shark is a new data analysis system that marries query processing with complex analytics on large clusters. It leverages a novel distributed memory abstraction to provide a unified …
We study the problem of computing conjunctive queries over large databases on parallel architectures without shared storage. Using the structure of such a query q and the skew in …
We present LocationSpark, a spatial data processing system built on top of Apache Spark, a widely used distributed data processing system. LocationSpark offers a rich set of spatial …