W Wu, B Li, L Chen, J Gao… - IEEE Transactions on …, 2020 - ieeexplore.ieee.org
Data similarity (or distance) computation is a fundamental research topic which underpins many high-level applications based on similarity measures in machine learning and data …
Network measurement is important to data center operators. Most existing efforts focus on developing new implementation schemes for measurement tasks. Little attention is paid to …
P Wang, Y Qi, Y Zhang, Q Zhai, C Wang… - Proceedings of the 25th …, 2019 - dl.acm.org
Estimating set similarity and detecting highly similar sets are fundamental problems in areas such as databases, machine learning, and information retrieval. MinHash is a well-known …
D Yang, B Li, L Rettig… - 2017 IEEE International …, 2017 - ieeexplore.ieee.org
Histogram-based similarity has been widely adopted in many machine learning tasks. However, measuring histogram similarity is a challenging task for streaming data, where the …
T Christiani, R Pagh - Proceedings of the 49th annual ACM SIGACT …, 2017 - dl.acm.org
We consider the problem of approximate set similarity search under Braun-Blanquet similarity B (x, y)=| x∩ y|/max (| x|,| y|). The (b 1, b 2)-approximate Braun-Blanquet similarity …
DA Rachkovskij - Cybernetics and Systems Analysis, 2017 - Springer
This review considers methods and algorithms for fast estimation of distance/similarity measures between initial data from vector representations with binary or integer-valued …
As an efficient tool for approximate similarity computation and search, Locality Sensitive Hashing (LSH) has been widely used in many research areas including databases, data …
O Ertl - arXiv preprint arXiv:2101.00314, 2021 - arxiv.org
MinHash and HyperLogLog are sketching algorithms that have become indispensable for set summaries in big data applications. While HyperLogLog allows counting different …
ES Dalmaijer - arXiv preprint arXiv:2309.00866, 2023 - arxiv.org
Before embarking on data collection, researchers typically compute how many individual observations they should do. This is vital for doing studies with sufficient statistical power …