The family of mapreduce and large-scale data processing systems

S Sakr, A Liu, AG Fayoumi - ACM Computing Surveys (CSUR), 2013 - dl.acm.org
In the last two decades, the continuous increase of computational power has produced an
overwhelming flow of data which has called for a paradigm shift in the computing …

Distributed data management using MapReduce

F Li, BC Ooi, MT Özsu, S Wu - ACM Computing Surveys (CSUR), 2014 - dl.acm.org
MapReduce is a framework for processing and managing large-scale datasets in a
distributed cluster, which has been used for applications such as generating search indexes …

Data-intensive applications, challenges, techniques and technologies: A survey on Big Data

CLP Chen, CY Zhang - Information sciences, 2014 - Elsevier
It is already true that Big Data has drawn huge attention from researchers in information
sciences, policy and decision makers in governments and enterprises. As the speed of …

Parallel data processing with MapReduce: a survey

KH Lee, YJ Lee, H Choi, YD Chung, B Moon - AcM sIGMoD record, 2012 - dl.acm.org
A prominent parallel data processing tool MapReduce is gaining significant momentum from
both industry and academia as the volume of data to analyze grows rapidly. While …

Big data processing in cloud computing environments

C Ji, Y Li, W Qiu, U Awada, K Li - 2012 12th international …, 2012 - ieeexplore.ieee.org
With the rapid growth of emerging applications like social network analysis, semantic Web
analysis and bioinformatics network analysis, a variety of data to be processed continues to …

A survey of large-scale analytical query processing in MapReduce

C Doulkeridis, K Nørvåg - The VLDB journal, 2014 - Springer
Enterprises today acquire vast volumes of data from different sources and leverage this
information by means of data analysis to support effective decision-making and provide new …

Llama: leveraging columnar storage for scalable join processing in the mapreduce framework

Y Lin, D Agrawal, C Chen, BC Ooi, S Wu - Proceedings of the 2011 ACM …, 2011 - dl.acm.org
To achieve high reliability and scalability, most large-scale data warehouse systems have
adopted the cluster-based architecture. In this paper, we propose the design of a new cluster …

Unstructured data analysis on big data using map reduce

V Subramaniyaswamy, V Vijayakumar… - Procedia Computer …, 2015 - Elsevier
In the real time scenario, the volume of data used linearly increases with time. Social
networking sites like Facebook, Twitter discovered the growth of data which will be …

Performance evaluation of K-means clustering on Hadoop infrastructure

S Vats, BB Sagar - Journal of Discrete Mathematical Sciences and …, 2019 - Taylor & Francis
Today we are living with the extensive volume of information which is developing at a very
fast pace. Clustering is the process of group similar kinds of information. The serial k-means …

MapReduce parallel programming model: a state-of-the-art survey

R Li, H Hu, H Li, Y Wu, J Yang - International Journal of Parallel …, 2016 - Springer
With the development of information technologies, we have entered the era of Big Data.
Google's MapReduce programming model and its open-source implementation in Apache …