Software defect prediction based on feature subset selection and ensemble classification

AA Saifan, L Abu-wardih - ECTI Transactions on Computer and …, 2020 - ph01.tci-thaijo.org
AA Saifan, L Abu-wardih
ECTI Transactions on Computer and Information Technology (ECTI-CIT), 2020ph01.tci-thaijo.org
Two primary issues have emerged in the machine learning and data mining community: how
to deal with imbalanced data and how to choose appropriate features. These are of
particular concern in the software engineering domain, and more specifically the field of
software defect prediction. This research highlights a procedure which includes a feature
selection technique to single out relevant attributes, and an ensemble technique to handle
the class-imbalance issue. In order to determine the advantages of feature selection and …
Abstract
Two primary issues have emerged in the machine learning and data mining community: how to deal with imbalanced data and how to choose appropriate features. These are of particular concern in the software engineering domain, and more specifically the field of software defect prediction. This research highlights a procedure which includes a feature selection technique to single out relevant attributes, and an ensemble technique to handle the class-imbalance issue. In order to determine the advantages of feature selection and ensemble methods we look at two potential scenarios:(1) Ensemble models constructed from the original datasets, without feature selection;(2) Ensemble models constructed from the reduced datasets after feature selection has been applied. Four feature selection techniques are employed: Principal Component Analysis (PCA), Pearson’s correlation, Greedy Stepwise Forward selection, and Information Gain (IG). The aim of this research is to assess the effectiveness of feature selection techniques using ensemble techniques. Five datasets, obtained from the PROMISE software depository, are analyzed; tentative results indicate that ensemble methods can improve the model's performance without the use of feature selection techniques. PCA feature selection and bagging based on K-NN perform better than both bagging based on SVM and boosting based on K-NN and SVM, and feature selection techniques including Pearson’s correlation, Greedy stepwise, and IG weaken the ensemble models’ performance.
ph01.tci-thaijo.org
以上显示的是最相近的搜索结果。 查看全部搜索结果