Output-optimal parallel algorithms for similarity joins

X Hu, K Yi, Y Tao - ACM Transactions on Database Systems (TODS), 2019 - dl.acm.org

Parallel join algorithms have received much attention in recent years due to the rapid
development of massively parallel systems such as MapReduce and Spark. In the database …

被引用次数：30 相关文章所有 7 个版本

[PDF] acm.org

Adaptive distributed streaming similarity joins

G Siachamis, K Psarakis, M Fragkoulis… - Proceedings of the 17th …, 2023 - dl.acm.org

How can we perform similarity joins of multi-dimensional streams in a distributed fashion,
achieving low latency? Can we adaptively repartition those streams in order to retain high …

被引用次数：3 相关文章所有 7 个版本

[PDF] arxiv.org

Distance-sensitive hashing

M Aumüller, T Christiani, R Pagh… - Proceedings of the 37th …, 2018 - dl.acm.org

Locality-sensitive hashing (LSH) is an important tool for managing high-dimensional noisy
or uncertain data, for example in connection with data cleaning (similarity join) and noise …

被引用次数：33 相关文章所有 9 个版本

Efficient set containment join

J Yang, W Zhang, S Yang, Y Zhang, X Lin, L Yuan - The VLDB Journal, 2018 - Springer

In this paper, we study the problem of set containment join. Given two collections RR and SS
of records, the set containment join R ⋈ _ ⊆ SR⋈⊆ S retrieves all record pairs {(r, s)\} ∈ R …

被引用次数：18 相关文章所有 4 个版本

Enabling Adaptive Sampling for Intra-Window Join: Simultaneously Optimizing Quantity and Quality

X Tang, F Zhang, S Zhang, Y Liu, B He, B He… - Proceedings of the …, 2024 - dl.acm.org

> Sampling is one of the most widely employed approximations in big data processing.
Among various challenges in sampling design, sampling for join is particularly intriguing yet …

An industrial dynamic skyline based similarity joins for multidimensional big data applications

B Yin, X Wei, J Wang, N Xiong… - IEEE Transactions on …, 2019 - ieeexplore.ieee.org

In the era of data deluge, data analysis has become a key task for many industrial
applications, eg, master data management, and data integration. In particular, similarity join …

被引用次数：13 相关文章

[PDF] arxiv.org

Instance and output optimal parallel algorithms for acyclic joins

X Hu, K Yi - Proceedings of the 38th ACM SIGMOD-SIGACT-SIGAI …, 2019 - dl.acm.org

Massively parallel join algorithms have received much attention in recent years, while most
prior work has focused on worst-optimal algorithms. However, the worst-case optimality of …

被引用次数：16 相关文章所有 12 个版本

A scalable similarity join algorithm based on MapReduce and LSH

S Rivault, M Bamha, S Limet, S Robert - International Journal of Parallel …, 2022 - Springer

Similarity joins are recognized to be among the most useful data processing and analysis
operations. A similarity join is used to retrieve all data pairs whose distances are smaller …

被引用次数：4 相关文章所有 5 个版本

[PDF] arxiv.org

Jodes: Efficient Oblivious Join in the Distributed Setting

Y Wang, X Zeng, S Wang, F Li - arXiv preprint arXiv:2501.09334, 2025 - arxiv.org

Trusted execution environment (TEE) has provided an isolated and secure environment for
building cloud-based analytic systems, but it still suffers from access pattern leakages …

[PDF][PDF] Massively parallel entity matching with linear classification in low dimensional space

Y Tao - 21st International Conference on Database Theory …, 2018 - drops.dagstuhl.de

In entity matching classification, we are given two sets R and S of objects where whether r
and s form a match is known for each pair (r, s) in R x S. If R and S are subsets of domains D …

被引用次数：15 相关文章所有 10 个版本

高级搜索

QQ 群