A distributed methodology for imbalanced classification problems

C Lemnaru, M Cuibus, A Bona, A Alic… - … on Parallel and …, 2012 - ieeexplore.ieee.org
2012 11th International Symposium on Parallel and Distributed …, 2012ieeexplore.ieee.org
Current important challenges in data mining research are triggered by the need to address
various particularities of real-world problems, such as imbalanced data and error cost
distributions. This paper presents Distributed Evolutionary Cost-Sensitive Balancing, a
distributed methodology for dealing with imbalanced data and--if necessary--cost
distributions. The method employs a genetic algorithm to search for an optimal cost matrix
and base classifier settings, which are then employed by a cost-sensitive classifier, wrapped …
Current important challenges in data mining research are triggered by the need to address various particularities of real-world problems, such as imbalanced data and error cost distributions. This paper presents Distributed Evolutionary Cost-Sensitive Balancing, a distributed methodology for dealing with imbalanced data and -- if necessary -- cost distributions. The method employs a genetic algorithm to search for an optimal cost matrix and base classifier settings, which are then employed by a cost-sensitive classifier, wrapped around the base classifier. Individual fitness computation is the most intensive task in the algorithm, but it also presents a high parallelization potential. Two different parallelization alternatives have been explored: a computation-driven approach, and a data-driven approach. Both have been developed within the Apache Watchmaker framework and deployed on Hadoop-based infrastructures. Experimental evaluations performed up to this point have indicated that the computation-driven approach achieves a good classification performance, but does not reduce the running time significantly, the data-driven approach reduces the running time for slow algorithms, such as the kNN and the SVM, while still yielding important performance improvements.
ieeexplore.ieee.org
以上显示的是最相近的搜索结果。 查看全部搜索结果