Set similarity join is a fundamental and well-studied database operator. It is usually studied in the exact setting where the goal is to compute all pairs of sets that exceed a given …
A number of tasks in classification, information retrieval, recommendation systems, and record linkage reduce to the core problem of inner product similarity join (IPS join) …
X Hu, K Yi, Y Tao - ACM Transactions on Database Systems (TODS), 2019 - dl.acm.org
Parallel join algorithms have received much attention in recent years due to the rapid development of massively parallel systems such as MapReduce and Spark. In the database …
Abstract Systems and methods which provide secure queries with respect to encrypted datasets are described. Embodiments provide privacy-assured similarity join techniques …
X Hu, Y Tao, K Yi - Proceedings of the 36th ACM SIGMOD-SIGACT …, 2017 - dl.acm.org
Parallel join algorithms have received much attention in recent years, due to the rapid development of massively parallel systems such as MapReduce and Spark. In the database …
We present an I/O-efficient algorithm for computing similarity joins based on locality- sensitive hashing (LSH). In contrast to the filtering methods commonly suggested our …
Similarity joins are a fundamental database operation. Given data sets S and R, the goal of a similarity join is to find all points x∈ S and y∈ R with distance at most r. Recent research …
Set similarity join, as well as the corresponding indexing problem set similarity search, are fundamental primitives for managing noisy or uncertain data. For example, these primitives …
P Beame, C Rashtchian - Proceedings of the Twenty-Eighth Annual ACM …, 2017 - SIAM
We study distributed protocols for finding all pairs of similar vectors in a large dataset. Our results pertain to a variety of discrete metrics, and we give concrete instantiations for …