Query evaluation techniques for large databases

G Graefe - ACM Computing Surveys (CSUR), 1993 - dl.acm.org
Database management systems will continue to manage large data volumes. Thus, efficient
algorithms for accessing and manipulating large sets and sequences will be required to …

Random sampling from databases: a survey

F Olken, D Rotem - Statistics and Computing, 1995 - Springer
This paper reviews recent literature on techniques for obtaining random samples from
databases. We begin with a discussion of why one would want to include sampling facilities …

Profiling relational data: a survey

Z Abedjan, L Golab, F Naumann - The VLDB Journal, 2015 - Springer
Profiling data to determine metadata about a given dataset is an important and frequent
activity of any IT professional and researcher and is necessary for various use-cases. It …

A linear-time probabilistic counting algorithm for database applications

KY Whang, BT Vander-Zanden, HM Taylor - ACM Transactions on …, 1990 - dl.acm.org
We present a probabilistic algorithm for counting the number of unique values in the
presence of duplicates. This algorithm has O (q) time complexity, where q is the number of …

[PDF][PDF] Random sampling from databases

F Olken - 1993 - Citeseer
This thesis has its origins in the 1980 dissertation of Jack Morgenstein [Mor80] and in
discussions at the Second Workshop on Statistical Data Management hosted by Lawrence …

RainForest—a framework for fast decision tree construction of large datasets

J Gehrke, R Ramakrishnan, V Ganti - Data Mining and Knowledge …, 2000 - Springer
Classification of large datasets is an important data mining problem. Many classification
algorithms have been proposed in the literature, but studies have shown that so far no …

[PDF][PDF] Sampling-based estimation of the number of distinct values of an attribute

PJ Haas, JF Naughton, S Seshadri, L Stokes - VLDB, 1995 - vldb.org
We provide several new sampling-based estimators of the number of distinct values of an
attribute in a relation. We compare these new estimators to estimators from the database …

Physical database design for relational databases

S Finkelstein, M Schkolnick, P Tiberio - ACM Transactions on Database …, 1988 - dl.acm.org
This paper describes the concepts used in the implementation of DBDSGN, an experimental
physical design tool for relational databases developed at the IBM San Jose Research …

Mc2: Rigorous and efficient directed greybox fuzzing

A Shah, D She, S Sadhu, K Singal, P Coffman… - Proceedings of the …, 2022 - dl.acm.org
Directed greybox fuzzing is a popular technique for targeted software testing that seeks to
find inputs that reach a set of target sites in a program. Most existing directed greybox …

Towards estimation error guarantees for distinct values

M Charikar, S Chaudhuri, R Motwani… - Proceedings of the …, 2000 - dl.acm.org
We consider the problem of estimating the number of distinct values in a column of a table.
For large tables without an index on the column, random sampling appears to be the only …