[PDF][PDF] Efficient aggregation algorithms for probabilistic data

TS Jayram, S Kale, E Vee - Proceedings of the eighteenth annual ACM …, 2007 - Citeseer
Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms, 2007Citeseer
We study the problem of computing aggregation operators on probabilistic data in an I/O
efficient manner. Algorithms for aggregation operators such as SUM, COUNT, AVG, and
MIN/MAX are crucial to applications on probabilistic databases. We give a generalization of
the classical data stream model to handle probabilistic data, called probabilistic streams, in
order to analyze the I/O-requirements of our algorithms. Whereas the algorithms for SUM
and COUNT turn out to be simple, the problem is harder for both AVG and MIN/MAX …
Abstract
We study the problem of computing aggregation operators on probabilistic data in an I/O efficient manner. Algorithms for aggregation operators such as SUM, COUNT, AVG, and MIN/MAX are crucial to applications on probabilistic databases. We give a generalization of the classical data stream model to handle probabilistic data, called probabilistic streams, in order to analyze the I/O-requirements of our algorithms. Whereas the algorithms for SUM and COUNT turn out to be simple, the problem is harder for both AVG and MIN/MAX. Although data stream algorithms typically use randomness, all of the algorithms we present are deterministic. For MIN and MAX, we obtain efficient one-pass data stream algorithms for estimating each of these quantities with relative accuracy (1+ ε), using constant update time per element and O (1 ε lg R) space, where each element has a value between 1 and R. For AVG, we present a new data stream algorithm for estimating its value to a relative accuracy (1+ ϵ) in O (log n) passes over the data with O (1 ε log2 n) space and update time
Citeseer
以上显示的是最相近的搜索结果。 查看全部搜索结果