MiSeRe-hadoop: a large-scale robust sequential classification rules mining framework

E Egho, D Gay, R Trinquart, M Boullé, N Voisine… - Big Data Analytics and …, 2017 - Springer
E Egho, D Gay, R Trinquart, M Boullé, N Voisine, F Clérot
Big Data Analytics and Knowledge Discovery: 19th International Conference …, 2017Springer
Sequence classification has become a fundamental problem in data mining and machine
learning. Feature based classification is one of the techniques that has been used widely for
sequence classification. Mining sequential classification rules plays an important role in
feature based classification. Despite the abundant literature in this area, mining sequential
classification rules is still a challenge; few of the available methods are sufficiently scalable
to handle large-scale datasets. MapReduce is an ideal framework to support distributed …
Abstract
Sequence classification has become a fundamental problem in data mining and machine learning. Feature based classification is one of the techniques that has been used widely for sequence classification. Mining sequential classification rules plays an important role in feature based classification. Despite the abundant literature in this area, mining sequential classification rules is still a challenge; few of the available methods are sufficiently scalable to handle large-scale datasets. MapReduce is an ideal framework to support distributed computing on large data sets on clusters of computers. In this paper, we propose a distributed version of MiSeRe algorithm on MapReduce, called MiSeRe-Hadoop. MiSeRe-Hadoop holds the same valuable properties as MiSeRe, i.e., it is: (i) robust and user parameter-free anytime algorithm and (ii) it employs an instance-based randomized strategy to promote diversity mining. We have applied our method on two real-world large datasets: a marketing dataset and a text dataset. Our results confirm that our method is scalable for large scale sequential data analysis.
Springer
以上显示的是最相近的搜索结果。 查看全部搜索结果