A review for weighted minhash algorithms

W Wu, B Li, L Chen, J Gao… - IEEE Transactions on …, 2020 - ieeexplore.ieee.org
Data similarity (or distance) computation is a fundamental research topic which underpins
many high-level applications based on similarity measures in machine learning and data …

Hashing-accelerated graph neural networks for link prediction

W Wu, B Li, C Luo, W Nejdl - Proceedings of the Web Conference 2021, 2021 - dl.acm.org
Networks are ubiquitous in the real world. Link prediction, as one of the key problems for
network-structured data, aims to predict whether there exists a link between two nodes. The …

Locality sensitive hashing in fourier frequency domain for soft set containment search

I Roy, R Agarwal, S Chakrabarti… - Advances in Neural …, 2023 - proceedings.neurips.cc
In many search applications related to passage retrieval, text entailment, and subgraph
search, the query and each'document'is a set of elements, with a document being relevant if …

A memory-efficient sketch method for estimating high similarities in streaming sets

P Wang, Y Qi, Y Zhang, Q Zhai, C Wang… - Proceedings of the 25th …, 2019 - dl.acm.org
Estimating set similarity and detecting highly similar sets are fundamental problems in areas
such as databases, machine learning, and information retrieval. MinHash is a well-known …

Efficient attributed network embedding via recursive randomized hashing

W Wu, B Li, L Chen, C Zhang - IJCAI international joint …, 2018 - opus.lib.uts.edu.au
© 2018 International Joint Conferences on Artificial Intelligence. All right reserved. Attributed
network embedding aims to learn a low-dimensional representation for each node of a …

Bagminhash-minwise hashing algorithm for weighted sets

O Ertl - Proceedings of the 24th ACM SIGKDD International …, 2018 - dl.acm.org
Minwise hashing has become a standard tool to calculate signatures which allow direct
estimation of Jaccard similarities. While very efficient algorithms already exist for the …

Bidirectionally densifying lsh sketches with empty bins

P Jia, P Wang, J Zhao, S Zhang, Y Qi, M Hu… - Proceedings of the …, 2021 - dl.acm.org
As an efficient tool for approximate similarity computation and search, Locality Sensitive
Hashing (LSH) has been widely used in many research areas including databases, data …

Consistent weighted sampling made more practical

W Wu, B Li, L Chen, C Zhang - … of the 26th international conference on …, 2017 - dl.acm.org
Min-Hash, which is widely used for efficiently estimating similarities of bag-of-words
represented data, plays an increasingly important role in the era of big data. It has been …

Improved consistent weighted sampling revisited

W Wu, B Li, L Chen, C Zhang… - IEEE Transactions on …, 2018 - ieeexplore.ieee.org
Min-Hash is a popular technique for efficiently estimating the Jaccard similarity of binary
sets. Consistent Weighted Sampling (CWS) generalizes the Min-Hash scheme to sketch …

-Ary Tree Hashing for Fast Graph Classification

W Wu, B Li, L Chen, X Zhu… - IEEE Transactions on …, 2017 - ieeexplore.ieee.org
Existing graph classification usually relies on an exhaustive enumeration of substructure
patterns, where the number of substructures expands exponentially wrt with the size of the …