The family of mapreduce and large-scale data processing systems

S Sakr, A Liu, AG Fayoumi - ACM Computing Surveys (CSUR), 2013 - dl.acm.org
In the last two decades, the continuous increase of computational power has produced an
overwhelming flow of data which has called for a paradigm shift in the computing …

Distributed data management using MapReduce

F Li, BC Ooi, MT Özsu, S Wu - ACM Computing Surveys (CSUR), 2014 - dl.acm.org
MapReduce is a framework for processing and managing large-scale datasets in a
distributed cluster, which has been used for applications such as generating search indexes …

Josie: Overlap set similarity search for finding joinable tables in data lakes

E Zhu, D Deng, F Nargesian, RJ Miller - Proceedings of the 2019 …, 2019 - dl.acm.org
We present a new solution for finding joinable tables in massive data lakes: given a table
and one join column, find tables that can be joined with the given table on the largest …

A survey of large-scale analytical query processing in MapReduce

C Doulkeridis, K Nørvåg - The VLDB journal, 2014 - Springer
Enterprises today acquire vast volumes of data from different sources and leverage this
information by means of data analysis to support effective decision-making and provide new …

Efficient processing of k nearest neighbor joins using mapreduce

W Lu, Y Shen, S Chen, BC Ooi - arXiv preprint arXiv:1207.0141, 2012 - arxiv.org
k nearest neighbor join (kNN join), designed to find k nearest neighbors from a dataset S for
every object in another dataset R, is a primitive operation widely adopted by many data …

An empirical evaluation of set similarity join techniques

W Mann, N Augsten, P Bouros - Proceedings of the VLDB Endowment, 2016 - dl.acm.org
Set similarity joins compute all pairs of similar sets from two collections of sets. We conduct
extensive experiments on seven state-of-the-art algorithms for set similarity joins. These …

String similarity search and join: a survey

M Yu, G Li, D Deng, J Feng - Frontiers of Computer Science, 2016 - Springer
String similarity search and join are two important operations in data cleaning and
integration, which extend traditional exact search and exact join operations in databases by …

MapReduce algorithms for big data analysis

K Shim - International Workshop on Databases in Networked …, 2013 - Springer
As there is an increasing trend of applications being expected to deal with big data that
usually do not fit in the main memory of a single machine, analyzing big data is a …

String similarity joins: An experimental evaluation

Y Jiang, G Li, J Feng, WS Li - Proceedings of the VLDB Endowment, 2014 - dl.acm.org
String similarity join is an important operation in data integration and cleansing that finds
similar string pairs from two collections of strings. More than ten algorithms have been …

Massjoin: A mapreduce-based method for scalable string similarity joins

D Deng, G Li, S Hao, J Wang… - 2014 IEEE 30th …, 2014 - ieeexplore.ieee.org
String similarity join is an essential operation in data integration. The era of big data calls for
scalable algorithms to support large-scale string similarity joins. In this paper, we study …