A catalog of stream processing optimizations

M Hirzel, R Soulé, S Schneider, B Gedik… - ACM Computing Surveys …, 2014 - dl.acm.org
Various research communities have independently arrived at stream processing as a
programming model for efficient and parallel computing. These communities include digital …

Adaptive query processing

A Deshpande, Z Ives, V Raman - Foundations and Trends® …, 2007 - nowpublishers.com
As the data management field has diversified to consider settings in which queries are
increasingly complex, statistics are less available, or data is stored remotely, there has been …

MapReduce: simplified data processing on large clusters

J Dean, S Ghemawat - Communications of the ACM, 2008 - dl.acm.org
MapReduce is a programming model and an associated implementation for processing and
generating large datasets that is amenable to a broad variety of real-world tasks. Users …

MapReduce: Simplified data processing on large clusters

J Dean, S Ghemawat - 2004 - usenix.org
MapReduce is a programming model and an associated implementation for processing and
generating large data sets. Users specify a _map_ function that processes a key/value pair …

The Google file system

S Ghemawat, H Gobioff, ST Leung - Proceedings of the nineteenth ACM …, 2003 - dl.acm.org
We have designed and implemented the Google File System, a scalable distributed file
system for large distributed data-intensive applications. It provides fault tolerance while …

Eddies: Continuously adaptive query processing

R Avnur, JM Hellerstein - Proceedings of the 2000 ACM SIGMOD …, 2000 - dl.acm.org
In large federated and shared-nothing databases, resources can exhibit widely fluctuating
characteristics. Assumptions made at the time a query is submitted will rarely hold …

Flux: An adaptive partitioning operator for continuous query systems

MA Shah, JM Hellerstein… - … Conference on Data …, 2003 - ieeexplore.ieee.org
The long-running nature of continuous queries poses new scalability challenges for dataflow
processing. CQ systems execute pipelined dataflows that may be shared across multiple …

Hippodrome: running circles around storage administration

E Anderson, M Hobbs, K Keeton, S Spence… - Conference on File and …, 2002 - usenix.org
Storage system configuration, even at the enterprise scale, is traditionally undertaken by
human experts using a time-consuming process of trial and error, guided by simple rules of …

[PDF][PDF] Adaptive query processing: Technology in evolution

JM Hellerstein, MJ Franklin, S Chandrasekaran… - IEEE Data Eng …, 2000 - cs.cmu.edu
As query engines are scaled and federated, they must cope with highly unpredictable and
changeable environments. In the Telegraph project, we are attempting to architect and …

Designing and mining multi-terabyte astronomy archives: The sloan digital sky survey

AS Szalay, PZ Kunszt, A Thakar, J Gray, D Slutz… - ACM SIGMOD …, 2000 - dl.acm.org
The next-generation astronomy digital archives will cover most of the sky at fine resolution in
many wavelengths, from X-rays, through ultraviolet, optical, and infrared. The archives will …