Sketchvisor: Robust network measurement for software packet processing

Q Huang, X Jin, PPC Lee, R Li, L Tang… - Proceedings of the …, 2017 - dl.acm.org
Network measurement remains a missing piece in today's software packet processing
platforms. Sketches provide a promising building block for filling this void by monitoring …

A review for weighted minhash algorithms

W Wu, B Li, L Chen, J Gao… - IEEE Transactions on …, 2020 - ieeexplore.ieee.org
Data similarity (or distance) computation is a fundamental research topic which underpins
many high-level applications based on similarity measures in machine learning and data …

Flymon: enabling on-the-fly task reconfiguration for network measurement

H Zheng, C Tian, T Yang, H Lin, C Liu… - Proceedings of the …, 2022 - dl.acm.org
Network measurement is important to data center operators. Most existing efforts focus on
developing new implementation schemes for measurement tasks. Little attention is paid to …

A memory-efficient sketch method for estimating high similarities in streaming sets

P Wang, Y Qi, Y Zhang, Q Zhai, C Wang… - Proceedings of the 25th …, 2019 - dl.acm.org
Estimating set similarity and detecting highly similar sets are fundamental problems in areas
such as databases, machine learning, and information retrieval. MinHash is a well-known …

Histosketch: Fast similarity-preserving sketching of streaming histograms with concept drift

D Yang, B Li, L Rettig… - 2017 IEEE International …, 2017 - ieeexplore.ieee.org
Histogram-based similarity has been widely adopted in many machine learning tasks.
However, measuring histogram similarity is a challenging task for streaming data, where the …

Set similarity search beyond minhash

T Christiani, R Pagh - Proceedings of the 49th annual ACM SIGACT …, 2017 - dl.acm.org
We consider the problem of approximate set similarity search under Braun-Blanquet
similarity B (x, y)=| x∩ y|/max (| x|,| y|). The (b 1, b 2)-approximate Braun-Blanquet similarity …

Binary vectors for fast distance and similarity estimation

DA Rachkovskij - Cybernetics and Systems Analysis, 2017 - Springer
This review considers methods and algorithms for fast estimation of distance/similarity
measures between initial data from vector representations with binary or integer-valued …

Bidirectionally densifying lsh sketches with empty bins

P Jia, P Wang, J Zhao, S Zhang, Y Qi, M Hu… - Proceedings of the …, 2021 - dl.acm.org
As an efficient tool for approximate similarity computation and search, Locality Sensitive
Hashing (LSH) has been widely used in many research areas including databases, data …

Setsketch: Filling the gap between minhash and hyperloglog

O Ertl - arXiv preprint arXiv:2101.00314, 2021 - arxiv.org
MinHash and HyperLogLog are sketching algorithms that have become indispensable for
set summaries in big data applications. While HyperLogLog allows counting different …

Tutorial: a priori estimation of sample size, effect size, and statistical power for cluster analysis, latent class analysis, and multivariate mixture models

ES Dalmaijer - arXiv preprint arXiv:2309.00866, 2023 - arxiv.org
Before embarking on data collection, researchers typically compute how many individual
observations they should do. This is vital for doing studies with sufficient statistical power …