Hyperloglog in practice: Algorithmic engineering of a state of the art cardinality estimation algorithm

S Heule, M Nunkesser, A Hall - … of the 16th International Conference on …, 2013 - dl.acm.org
Cardinality estimation has a wide range of applications and is of particular importance in
database systems. Various algorithms have been proposed in the past, and the …

[图书][B] Machine learning for data streams: with practical examples in MOA

A Bifet, R Gavalda, G Holmes, B Pfahringer - 2023 - books.google.com
A hands-on approach to tasks and techniques in data stream mining and real-time analytics,
with examples in MOA, a popular freely available open-source software framework. Today …

Synopses for massive data: Samples, histograms, wavelets, sketches

G Cormode, M Garofalakis, PJ Haas… - … and Trends® in …, 2011 - nowpublishers.com
Abstract Methods for Approximate Query Processing (AQP) are essential for dealing with
massive data. They are often the only means of providing interactive response times when …

Finding frequent items in data streams

G Cormode, M Hadjieleftheriou - Proceedings of the VLDB Endowment, 2008 - dl.acm.org
The frequent items problem is to process a stream of items and find all items occurring more
than a given fraction of the time. It is one of the most heavily studied problems in data stream …

Cardinality estimation: An experimental survey

H Harmouch, F Naumann - Proceedings of the VLDB Endowment, 2017 - dl.acm.org
Data preparation and data profiling comprise many both basic and complex tasks to analyze
a dataset at hand and extract metadata, such as data distributions, key candidates, and …

Methods for finding frequent items in data streams

G Cormode, M Hadjieleftheriou - The VLDB Journal, 2010 - Springer
The frequent items problem is to process a stream of items and find all items occurring more
than a given fraction of the time. It is one of the most heavily studied problems in data stream …

[PDF][PDF] Sketch techniques for approximate query processing

G Cormode - Foundations and Trends in Databases …, 2011 - archive.dimacs.rutgers.edu
Sketch techniques have undergone extensive development within the past few years. They
are especially appropriate for the data streaming scenario, in which large quantities of data …

Approximate distinct counts for billions of datasets

D Ting - Proceedings of the 2019 International Conference on …, 2019 - dl.acm.org
Cardinality estimation plays an important role in processing big data. We consider the
challenging problem of computing millions or more distinct count aggregations in a single …

Panakos: Chasing the tails for multidimensional data streams

F Zhao, PI Khan, D Agrawal, AE Abbadi… - Proceedings of the …, 2023 - dl.acm.org
System operators are often interested in extracting different feature streams from multi-
dimensional data streams; and reporting their distributions at regular intervals, including the …

Skt: A one-pass multi-sketch data analytics accelerator

M Chiosa, TB Preußer… - Proceedings of the …, 2021 - research-collection.ethz.ch
Data analysts often need to characterize a data stream as a first step to its further
processing. Some of the initial insights to be gained include, eg, the cardinality of the data …