Big data analytics: a survey

CW Tsai, CF Lai, HC Chao, AV Vasilakos - Journal of Big data, 2015 - Springer
The age of big data is now coming. But the traditional data analytics may not be able to
handle such large quantities of data. The question that arises now is, how to develop a high …

A survey of open source tools for machine learning with big data in the Hadoop ecosystem

S Landset, TM Khoshgoftaar, AN Richter, T Hasanin - Journal of Big Data, 2015 - Springer
With an ever-increasing amount of options, the task of selecting machine learning tools for
big data can be difficult. The available tools have advantages and drawbacks, and many …

A survey of parallel sequential pattern mining

W Gan, JCW Lin, P Fournier-Viger, HC Chao… - ACM Transactions on …, 2019 - dl.acm.org
With the growing popularity of shared resources, large volumes of complex data of different
types are collected automatically. Traditional data mining algorithms generally have …

Protection of big data privacy

A Mehmood, I Natgunanathan, Y Xiang, G Hua… - IEEE …, 2016 - ieeexplore.ieee.org
In recent years, big data have become a hot research topic. The increasing amount of big
data also increases the chance of breaching the privacy of individuals. Since big data …

Approxhadoop: Bringing approximations to mapreduce frameworks

I Goiri, R Bianchini, S Nagarakatte… - Proceedings of the …, 2015 - dl.acm.org
We propose and evaluate a framework for creating and running approximation-enabled
MapReduce programs. Specifically, we propose approximation mechanisms that fit naturally …

Frequent itemset mining for big data

S Moens, E Aksehirli, B Goethals - 2013 IEEE international …, 2013 - ieeexplore.ieee.org
Frequent Itemset Mining (FIM) is one of the most well known techniques to extract
knowledge from data. The combinatorial explosion of FIM methods become even more …

Data mining in distributed environment: a survey

W Gan, JCW Lin, HC Chao… - … Reviews: Data Mining and …, 2017 - Wiley Online Library
Due to the rapid growth of resource sharing, distributed systems are developed, which can
be used to utilize the computations. Data mining (DM) provides powerful techniques for …

Fidoop: Parallel mining of frequent itemsets using mapreduce

Y Xun, J Zhang, X Qin - IEEE transactions on Systems, Man …, 2015 - ieeexplore.ieee.org
Existing parallel mining algorithms for frequent itemsets lack a mechanism that enables
automatic parallelization, load balancing, data distribution, and fault tolerance on large …

Performance optimization of MapReduce-based Apriori algorithm on Hadoop cluster

S Singh, R Garg, PK Mishra - Computers & Electrical Engineering, 2018 - Elsevier
Many techniques have been proposed to implement the Apriori algorithm on MapReduce
framework but only a few have focused on performance improvement. FPC (Fixed Passes …

Adaptive-Miner: an efficient distributed association rule mining algorithm on Spark

S Rathee, A Kashyap - Journal of Big Data, 2018 - Springer
Extraction of valuable data from extensive datasets is a standout amongst the most vital
exploration issues. Association rule mining is one of the highly used methods for this …