DPSP: Distributed progressive sequential pattern mining on the cloud

JW Huang, SC Lin, MS Chen - … on Knowledge Discovery and Data Mining, 2010 - Springer
JW Huang, SC Lin, MS Chen
Pacific-Asia Conference on Knowledge Discovery and Data Mining, 2010Springer
The progressive sequential pattern mining problem has been discussed in previous
research works. With the increasing amount of data, single processors struggle to scale up.
Traditional algorithms running on a single machine may have scalability troubles. Therefore,
mining progressive sequential patterns intrinsically suffers from the scalability problem. In
view of this, we design a distributed mining algorithm to address the scalability problem of
mining progressive sequential patterns. The proposed algorithm DPSP, standing for …
Abstract
The progressive sequential pattern mining problem has been discussed in previous research works. With the increasing amount of data, single processors struggle to scale up. Traditional algorithms running on a single machine may have scalability troubles. Therefore, mining progressive sequential patterns intrinsically suffers from the scalability problem. In view of this, we design a distributed mining algorithm to address the scalability problem of mining progressive sequential patterns. The proposed algorithm DPSP, standing for Distributed Progressive Sequential Pattern mining algorithm, is implemented on top of Hadoop platform, which realizes the cloud computing environment. We propose Map/Reduce jobs in DPSP to delete obsolete itemsets, update current candidate sequential patterns and report up-to-date frequent sequential patterns within each POI. The experimental results show that DPSP possesses great scalability and consequently increases the performance and the practicability of mining algorithms.
Springer
以上显示的是最相近的搜索结果。 查看全部搜索结果