A survey of query result diversification

K Zheng, H Wang, Z Qi, J Li, H Gao - Knowledge and Information Systems, 2017 - Springer
Nowadays, in information systems such as web search engines and databases, diversity is
becoming increasingly essential and getting more and more attention for improving users' …

An axiomatic approach for result diversification

S Gollapudi, A Sharma - … of the 18th international conference on World …, 2009 - dl.acm.org
Understanding user intent is key to designing an effective ranking system in a search
engine. In the absence of any explicit knowledge of user intent, search engines want to …

A review for weighted minhash algorithms

W Wu, B Li, L Chen, J Gao… - IEEE Transactions on …, 2020 - ieeexplore.ieee.org
Data similarity (or distance) computation is a fundamental research topic which underpins
many high-level applications based on similarity measures in machine learning and data …

Locality-sensitive hashing for the edit distance

G Marçais, D DeBlasio, P Pandey, C Kingsford - Bioinformatics, 2019 - academic.oup.com
Motivation Sequence alignment is a central operation in bioinformatics pipeline and, despite
many improvements, remains a computationally challenging problem. Locality-sensitive …

[PDF][PDF] Consistent weighted sampling

M Manasse, F McSherry, K Talwar - Unpublished technical report) …, 2010 - academia.edu
Consistent Weighted Sampling Page 1 Consistent Weighted Sampling Mark Manasse
Microsoft Research, SVC manasse@microsoft.com Frank McSherry Microsoft Research, SVC …

Efficient estimation for high similarities using odd sketches

M Mitzenmacher, R Pagh, N Pham - Proceedings of the 23rd …, 2014 - dl.acm.org
Estimating set similarity is a central problem in many computer applications. In this paper we
introduce the Odd Sketch, a compact binary sketch for estimating the Jaccard similarity of …

Simple and efficient weighted minwise hashing

A Shrivastava - Advances in Neural Information Processing …, 2016 - proceedings.neurips.cc
Weighted minwise hashing (WMH) is one of the fundamental subroutine, required by many
celebrated approximation algorithms, commonly adopted in industrial practice for large …

Rejection sampling for weighted jaccard similarity revisited

X Li, P Li - Proceedings of the AAAI Conference on Artificial …, 2021 - ojs.aaai.org
Efficiently computing the weighted Jaccard similarity has become an active research topic in
machine learning and theory. For sparse data, the standard technique is based on the …

Bagminhash-minwise hashing algorithm for weighted sets

O Ertl - Proceedings of the 24th ACM SIGKDD International …, 2018 - dl.acm.org
Minwise hashing has become a standard tool to calculate signatures which allow direct
estimation of Jaccard similarities. While very efficient algorithms already exist for the …

Locality sensitive hashing in fourier frequency domain for soft set containment search

I Roy, R Agarwal, S Chakrabarti… - Advances in Neural …, 2023 - proceedings.neurips.cc
In many search applications related to passage retrieval, text entailment, and subgraph
search, the query and each'document'is a set of elements, with a document being relevant if …