S Gollapudi, A Sharma - … of the 18th international conference on World …, 2009 - dl.acm.org
Understanding user intent is key to designing an effective ranking system in a search engine. In the absence of any explicit knowledge of user intent, search engines want to …
W Wu, B Li, L Chen, J Gao… - IEEE Transactions on …, 2020 - ieeexplore.ieee.org
Data similarity (or distance) computation is a fundamental research topic which underpins many high-level applications based on similarity measures in machine learning and data …
Motivation Sequence alignment is a central operation in bioinformatics pipeline and, despite many improvements, remains a computationally challenging problem. Locality-sensitive …
Consistent Weighted Sampling Page 1 Consistent Weighted Sampling Mark Manasse Microsoft Research, SVC manasse@microsoft.com Frank McSherry Microsoft Research, SVC …
Estimating set similarity is a central problem in many computer applications. In this paper we introduce the Odd Sketch, a compact binary sketch for estimating the Jaccard similarity of …
A Shrivastava - Advances in Neural Information Processing …, 2016 - proceedings.neurips.cc
Weighted minwise hashing (WMH) is one of the fundamental subroutine, required by many celebrated approximation algorithms, commonly adopted in industrial practice for large …
X Li, P Li - Proceedings of the AAAI Conference on Artificial …, 2021 - ojs.aaai.org
Efficiently computing the weighted Jaccard similarity has become an active research topic in machine learning and theory. For sparse data, the standard technique is based on the …
O Ertl - Proceedings of the 24th ACM SIGKDD International …, 2018 - dl.acm.org
Minwise hashing has become a standard tool to calculate signatures which allow direct estimation of Jaccard similarities. While very efficient algorithms already exist for the …
I Roy, R Agarwal, S Chakrabarti… - Advances in Neural …, 2023 - proceedings.neurips.cc
In many search applications related to passage retrieval, text entailment, and subgraph search, the query and each'document'is a set of elements, with a document being relevant if …