Software defect prediction based on feature subset selection and ensemble classification- 学术资源搜索

Software defect prediction based on feature subset selection and ensemble classification

AA Saifan, L Abu-wardih - ECTI Transactions on Computer and …, 2020 - ph01.tci-thaijo.org

AA Saifan, L Abu-wardih

ECTI Transactions on Computer and Information Technology (ECTI-CIT), 2020•ph01.tci-thaijo.org

Abstract

Two primary issues have emerged in the machine learning and data mining community: how to deal with imbalanced data and how to choose appropriate features. These are of particular concern in the software engineering domain, and more specifically the field of software defect prediction. This research highlights a procedure which includes a feature selection technique to single out relevant attributes, and an ensemble technique to handle the class-imbalance issue. In order to determine the advantages of feature selection and ensemble methods we look at two potential scenarios:(1) Ensemble models constructed from the original datasets, without feature selection;(2) Ensemble models constructed from the reduced datasets after feature selection has been applied. Four feature selection techniques are employed: Principal Component Analysis (PCA), Pearson’s correlation, Greedy Stepwise Forward selection, and Information Gain (IG). The aim of this research is to assess the effectiveness of feature selection techniques using ensemble techniques. Five datasets, obtained from the PROMISE software depository, are analyzed; tentative results indicate that ensemble methods can improve the model's performance without the use of feature selection techniques. PCA feature selection and bagging based on K-NN perform better than both bagging based on SVM and boosting based on K-NN and SVM, and feature selection techniques including Pearson’s correlation, Greedy stepwise, and IG weaken the ensemble models’ performance.

ph01.tci-thaijo.org

展开收起

被引用次数：14 相关文章所有 8 个版本

以上显示的是最相近的搜索结果。查看全部搜索结果

高级搜索

QQ 群

Software defect prediction based on feature subset selection and ensemble classification

引用