V-smart-join: A scalable mapreduce framework for all-pair similarity joins of multisets and vectors

S Sakr, A Liu, AG Fayoumi - ACM Computing Surveys (CSUR), 2013 - dl.acm.org

In the last two decades, the continuous increase of computational power has produced an
overwhelming flow of data which has called for a paradigm shift in the computing …

被引用次数：257 相关文章所有 9 个版本

[PDF] psu.edu

Distributed data management using MapReduce

F Li, BC Ooi, MT Özsu, S Wu - ACM Computing Surveys (CSUR), 2014 - dl.acm.org

MapReduce is a framework for processing and managing large-scale datasets in a
distributed cluster, which has been used for applications such as generating search indexes …

被引用次数：249 相关文章所有 15 个版本

[PDF] acm.org

Josie: Overlap set similarity search for finding joinable tables in data lakes

E Zhu, D Deng, F Nargesian, RJ Miller - Proceedings of the 2019 …, 2019 - dl.acm.org

We present a new solution for finding joinable tables in massive data lakes: given a table
and one join column, find tables that can be joined with the given table on the largest …

被引用次数：201 相关文章所有 7 个版本

[PDF] princeton.edu

A survey of large-scale analytical query processing in MapReduce

C Doulkeridis, K Nørvåg - The VLDB journal, 2014 - Springer

Enterprises today acquire vast volumes of data from different sources and leverage this
information by means of data analysis to support effective decision-making and provide new …

被引用次数：339 相关文章所有 15 个版本

[PDF] arxiv.org

Efficient processing of k nearest neighbor joins using mapreduce

W Lu, Y Shen, S Chen, BC Ooi - arXiv preprint arXiv:1207.0141, 2012 - arxiv.org

k nearest neighbor join (kNN join), designed to find k nearest neighbors from a dataset S for
every object in another dataset R, is a primitive operation widely adopted by many data …

被引用次数：363 相关文章所有 15 个版本

[PDF] researchgate.net

An empirical evaluation of set similarity join techniques

W Mann, N Augsten, P Bouros - Proceedings of the VLDB Endowment, 2016 - dl.acm.org

Set similarity joins compute all pairs of similar sets from two collections of sets. We conduct
extensive experiments on seven state-of-the-art algorithms for set similarity joins. These …

被引用次数：172 相关文章所有 13 个版本

[PDF] hep.com.cn

String similarity search and join: a survey

M Yu, G Li, D Deng, J Feng - Frontiers of Computer Science, 2016 - Springer

String similarity search and join are two important operations in data cleaning and
integration, which extend traditional exact search and exact join operations in databases by …

被引用次数：175 相关文章所有 17 个版本

[PDF] vldb.org

MapReduce algorithms for big data analysis

K Shim - International Workshop on Databases in Networked …, 2013 - Springer

As there is an increasing trend of applications being expected to deal with big data that
usually do not fit in the main memory of a single machine, analyzing big data is a …

被引用次数：234 相关文章所有 9 个版本

[PDF] vldb.org

String similarity joins: An experimental evaluation

Y Jiang, G Li, J Feng, WS Li - Proceedings of the VLDB Endowment, 2014 - dl.acm.org

String similarity join is an important operation in data integration and cleansing that finds
similar string pairs from two collections of strings. More than ten algorithms have been …

被引用次数：205 相关文章所有 11 个版本

[PDF] tsinghua.edu.cn

Massjoin: A mapreduce-based method for scalable string similarity joins

D Deng, G Li, S Hao, J Wang… - 2014 IEEE 30th …, 2014 - ieeexplore.ieee.org

String similarity join is an essential operation in data integration. The era of big data calls for
scalable algorithms to support large-scale string similarity joins. In this paper, we study …

被引用次数：160 相关文章所有 18 个版本

高级搜索

QQ 群