Privacy-preserving similarity joins over encrypted data

X Yuan, X Wang, C Wang, C Yu… - IEEE Transactions on …, 2017 - ieeexplore.ieee.org
Similarity search on high-dimensional data has been intensively studied for data processing
and analytics. Despite its broad applicability, data security and privacy concerns along the …

Scalable and robust set similarity join

T Christiani, R Pagh, J Sivertsen - 2018 IEEE 34th international …, 2018 - ieeexplore.ieee.org
Set similarity join is a fundamental and well-studied database operator. It is usually studied
in the exact setting where the goal is to compute all pairs of sets that exceed a given …

On the complexity of inner product similarity join

TD Ahle, R Pagh, I Razenshteyn… - Proceedings of the 35th …, 2016 - dl.acm.org
A number of tasks in classification, information retrieval, recommendation systems, and
record linkage reduce to the core problem of inner product similarity join (IPS join) …

Output-optimal massively parallel algorithms for similarity joins

X Hu, K Yi, Y Tao - ACM Transactions on Database Systems (TODS), 2019 - dl.acm.org
Parallel join algorithms have received much attention in recent years due to the rapid
development of massively parallel systems such as MapReduce and Spark. In the database …

Systems and methods for privacy-assured similarity joins over encrypted datasets

C Wang, S Nutanong, X Yuan, X Wang… - US Patent 10,496,638, 2019 - Google Patents
Abstract Systems and methods which provide secure queries with respect to encrypted
datasets are described. Embodiments provide privacy-assured similarity join techniques …

Output-optimal parallel algorithms for similarity joins

X Hu, Y Tao, K Yi - Proceedings of the 36th ACM SIGMOD-SIGACT …, 2017 - dl.acm.org
Parallel join algorithms have received much attention in recent years, due to the rapid
development of massively parallel systems such as MapReduce and Spark. In the database …

I/O-efficient similarity join

R Pagh, N Pham, F Silvestri, M Stöckel - Algorithmica, 2017 - Springer
We present an I/O-efficient algorithm for computing similarity joins based on locality-
sensitive hashing (LSH). In contrast to the filtering methods commonly suggested our …

Adaptive mapreduce similarity joins

S McCauley, F Silvestri - Proceedings of the 5th ACM SIGMOD …, 2018 - dl.acm.org
Similarity joins are a fundamental database operation. Given data sets S and R, the goal of a
similarity join is to find all points x∈ S and y∈ R with distance at most r. Recent research …

Set similarity search for skewed data

S McCauley, JW Mikkelsen, R Pagh - … of the 37th ACM SIGMOD-SIGACT …, 2018 - dl.acm.org
Set similarity join, as well as the corresponding indexing problem set similarity search, are
fundamental primitives for managing noisy or uncertain data. For example, these primitives …

Massively-parallel similarity join, edge-isoperimetry, and distance correlations on the hypercube

P Beame, C Rashtchian - Proceedings of the Twenty-Eighth Annual ACM …, 2017 - SIAM
We study distributed protocols for finding all pairs of similar vectors in a large dataset. Our
results pertain to a variety of discrete metrics, and we give concrete instantiations for …