作者
Tyson Condie, Neil Conway, Peter Alvaro, Joseph M Hellerstein, John Gerth, Justin Talbot, Khaled Elmeleegy, Russell Sears
发表日期
2010/6/6
图书
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
页码范围
1115-1118
简介
MapReduce is a popular framework for data-intensive distributed computing of batch jobs. To simplify fault tolerance, the output of each MapReduce task and job is materialized to disk before it is consumed. In this demonstration, we describe a modified MapReduce architecture that allows data to be pipelined between operators. This extends the MapReduce programming model beyond batch processing, and can reduce completion times and improve system utilization for batch jobs as well. We demonstrate a modified version of the Hadoop MapReduce framework that supports online aggregation, which allows users to see "early returns" from a job as it is being computed. Our Hadoop Online Prototype (HOP) also supports continuous queries, which enable MapReduce programs to be written for applications such as event monitoring and stream processing. HOP retains the fault tolerance properties of Hadoop …
引用总数
2009201020112012201320142015201620172018201920202021202220231312292825243613211613174
学术搜索中的文章
T Condie, N Conway, P Alvaro, JM Hellerstein, J Gerth… - Proceedings of the 2010 ACM SIGMOD International …, 2010