skip to main content
article
Free access

Ripple joins for online aggregation

Published: 01 June 1999 Publication History

Abstract

We present a new family of join algorithms, called ripple joins, for online processing of multi-table aggregation queries in a relational database management system (DBMS). Such queries arise naturally in interactive exploratory decision-support applications.
Traditional offline join algorithms are designed to minimize the time to completion of the query. In contrast, ripple joins are designed to minimize the time until an acceptably precise estimate of the query result is available, as measured by the length of a confidence interval. Ripple joins are adaptive, adjusting their behavior during processing in accordance with the statistical properties of the data. Ripple joins also permit the user to dynamically trade off the two key performance factors of on-line aggregation: the time between successive updates of the running aggregate, and the amount by which the confidence-interval length decreases at each update. We show how ripple joins can be implemented in an existing DBMS using iterators, and we give an overview of the methods used to compute confidence intervals and to adaptively optimize the ripple join “aspect-ratio” parameters. In experiments with an initial implementation of our algorithms in the POSTGRES DBMS, the time required to produce reasonably precise online estimates was up to two orders of magnitude smaller than the time required for the best offline join algorithms to produce exact answers.

References

[1]
M. Abramowitz and I. A. Stegun. Handbook of Mathematical Functions. Dover, New York, 1972. Ninth printing.
[2]
P. Billingsley. Probability and Measure. Wiley, New York, 1986.
[3]
T. F. Chan, G. H. Golub, and R. J. LeVeque. Algorithms for computing She sample variance: Analysis and recommendation. Amer. Statist., 37:242-247, 1983.
[4]
D. J. DeWitt, R. H. Katz, F. Olken, L. D. Shapiro, M. R. Michael R. Stonebraker, and D. Wood. Implementation Techniques for Main Memory Database Systems. In Proc. 198g A CM SIGMOD Intl. Conf. Managment of Data, pages 1-8. ACM Press, 1984.
[5]
J. Gray and G. Graefe. The five-minute rule ten years later and other computer storage rules of thumb. SIGMOD Record, 26(4), 1997.
[6]
G. Graefe. Query Evaluation Techniques for Large Databases. ACM Comput. Surveys, 25(2):73-170, June 1993.
[7]
P. J. Haas. Large-sample and deterministic confidence intervals for online aggregation. In Proc. Ninth Intl. Con}. Scientific and Statist. Database Management, pages 51-63. I}EEE Computer Society Press, 1997.
[8]
J. M. Hellerstein, R. Avnur, and V. Raman. informix under CONTROL: Online query processing. Submitted for publication, 1999.
[9]
P. J. Haas and J. M. Hellerstein. Join algorithms for online aggregation. IBM Research Report RJ 10126, IBM Almaden Research Center, San Jose, CA, 1998.
[10]
J. M. Hellerstein, P. J. Haas, and H. J. Wang. Online aggregation. In Proc. 1997 ACM SIGMOD Intl. Conf. Managment of Data, pages 171-182. ACM Press, 1997.
[11]
J. M. Hellerstein and J. F. Naughton. Query Execution Techniques for Caching Expensive Methods. in Proc. 1996 A CM SIGMOD Intl. Conf. Managment of Data, pages 423- 424. ACM Press, 1996.
[12]
P. J. Haas, J. F. Naughton, and A. N. Swami. On the relative cost of sampling for join selectivity estimation. In Proc. Thirteenth A CM SIGA CT-SIGMOD-SIGART Syrup. Principles of Database Sys., pages 14-24. ACM Press, 1994.
[13]
P. J. Haas, J. F. Naughton, S. Seshadri, and A. N. Swami. Selectivity and cost estimation for joins based on,'andom sampling. J. Comput. System Sci., 52:550-569, 1996.
[14]
W. Hoeffding. A class of statistics with asymptotically normal distribution. Ann. Math. Statist., 19:293-325, 1948.
[15]
W. Hou, G. Ozsoyoglu, and B. Taneja. Statistical estimators for relational algebra expressions. In Proc. Seventh A CM SIGACT-SIGMOD-SIGART Syrup. Principles of Database Sys., pages 276-287. ACM Press, 1988.
[16]
W. Hou, G. Ozsoyoglu, and B. Taneja. Processing aggregate relational queries with hard time constraints. In Proc. 1989 A CM SIGMOD Intl. Conf. Managment of Data, pages 68-77. ACM Press, 1989.
[17]
IBM Corporation. IBM DB2 Universal Database Administration Guide, Version 5. North York, Ontario, Canada, 1997.
[18]
Informix Corporation. Informix Universal Server G~ide to SQL: Syntax, Version 9.01. Menlo Park, CA, March 1997.
[19]
G. Ozsoyoglu, K. Du, A. Tjahjana, W. Hou, and D. Y. Rowland. On estimating COUNT, SUM, and AVERAGE relational algebra queries. In D. Dimitris Karagiannis, editor, Database and Expert Systems Applications, Proceedings of the International Conference in Berlin, Germany, 1991 (DEXA 91), pages 406-412. Springer-Verlag, 1991.
[20]
V. O'day and R. Jeffries. Orienteering in an informati:on landscape: How information seekers get from here to there. In Human Factors in Computing Systems: INTERCHI '93 Conf. Proc., pages 438-445. ACM Press, 1993.
[21]
F. Olken. Random Sampling .from Databases. Ph.D. Dissertation, University of California, Berkeley, CA, 1993. Available as Tech. Report LBL-32883, Lawrence Berkeley Laboratories, Berkeley, CA.
[22]
Oracle Corporation. Oracle8 Server SQL Reference, Release 8.0. Redwood Shores, CA, June 1997.
[23]
PostgreSQL Home Page, 1998. http://www.postgre,aql .org.
[24]
V. Raman, B. Raman, and J. M. Hellerstein. Online dynamic reordering for interactive data processing. Technical Report UCB//CSD-99-1043, Computer Science Division, UC Berkeley, 1999. Submitted for publication.
[25]
R. Ramakrishnan, D. Srivastava, and S. Sudarshan. Rtde ordering in bottom-up fixpoint evaluation of logic prograrrts. Trans. Knowledge and Data Engrg., 6(4):501-517, 1994.
[26]
A. N. Wilschut and P. M. G. Apers. Dataflow query execution in a parallel main-memory environment. In Proc. First Intl. Conf. Parallel and Distributed Info. Sys. (PDIS), pages 68-77. IEEE Computer Society Press, 1991.

Cited By

View all
  • (2023)PlexusProceedings of the 2023 ACM Symposium on Cloud Computing10.1145/3620678.3624643(1-16)Online publication date: 30-Oct-2023
  • (2022)Revisiting the Design of Parallel Stream Joins on Trusted Execution EnvironmentsAlgorithms10.3390/a1506018315:6(183)Online publication date: 25-May-2022
  • (2022)Lightweight and Accurate Cardinality Estimation by Neural Network Gaussian ProcessProceedings of the 2022 International Conference on Management of Data10.1145/3514221.3526156(973-987)Online publication date: 10-Jun-2022
  • Show More Cited By

Index Terms

  1. Ripple joins for online aggregation

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM SIGMOD Record
    ACM SIGMOD Record  Volume 28, Issue 2
    June 1999
    599 pages
    ISSN:0163-5808
    DOI:10.1145/304181
    Issue’s Table of Contents
    • cover image ACM Conferences
      SIGMOD '99: Proceedings of the 1999 ACM SIGMOD international conference on Management of data
      June 1999
      604 pages
      ISBN:1581130848
      DOI:10.1145/304182
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 01 June 1999
    Published in SIGMOD Volume 28, Issue 2

    Check for updates

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)273
    • Downloads (Last 6 weeks)43
    Reflects downloads up to 22 Sep 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)PlexusProceedings of the 2023 ACM Symposium on Cloud Computing10.1145/3620678.3624643(1-16)Online publication date: 30-Oct-2023
    • (2022)Revisiting the Design of Parallel Stream Joins on Trusted Execution EnvironmentsAlgorithms10.3390/a1506018315:6(183)Online publication date: 25-May-2022
    • (2022)Lightweight and Accurate Cardinality Estimation by Neural Network Gaussian ProcessProceedings of the 2022 International Conference on Management of Data10.1145/3514221.3526156(973-987)Online publication date: 10-Jun-2022
    • (2022)A Sampling-based Learning Framework for Big DatabasesProceedings of the ACM Web Conference 202210.1145/3485447.3511991(1871-1881)Online publication date: 25-Apr-2022
    • (2022)Impact of Cognitive Biases on Progressive VisualizationIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2021.305101328:9(3093-3112)Online publication date: 1-Sep-2022
    • (2022)Aggregate Queries on Knowledge Graphs: Fast Approximation with Semantic-aware Sampling2022 IEEE 38th International Conference on Data Engineering (ICDE)10.1109/ICDE53745.2022.00263(2914-2927)Online publication date: May-2022
    • (2022)Salvaging failing and straggling queries2022 IEEE 38th International Conference on Data Engineering (ICDE)10.1109/ICDE53745.2022.00108(1382-1395)Online publication date: May-2022
    • (2022)PSATop-kKnowledge-Based Systems10.1016/j.knosys.2021.107614235:COnline publication date: 10-Jan-2022
    • (2022)A random walk sampling on knowledge graphs for semantic-oriented statistical tasksData & Knowledge Engineering10.1016/j.datak.2022.102024140:COnline publication date: 1-Jul-2022
    • (2022)A hybrid prediction and search approach for flexible and efficient exploration of big dataJournal of Visualization10.1007/s12650-022-00887-y26:2(457-475)Online publication date: 6-Oct-2022
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media