A study of software metric selection techniques: Stability analysis and defect prediction model performance

H Wang, TM Khoshgoftaar, Q Liang - International journal on …, 2013 - World Scientific
International journal on artificial intelligence tools, 2013World Scientific
Software metrics (features or attributes) are collected during the software development cycle.
Metric selection is one of the most important preprocessing steps in the process of building
defect prediction models and may improve the final prediction result. However, the addition
or removal of program modules (instances or samples) can alter the subsets chosen by a
feature selection technique, rendering the previously-selected feature sets invalid. Very
limited research have been done considering both stability (or robustness) and defect …
Software metrics (features or attributes) are collected during the software development cycle. Metric selection is one of the most important preprocessing steps in the process of building defect prediction models and may improve the final prediction result. However, the addition or removal of program modules (instances or samples) can alter the subsets chosen by a feature selection technique, rendering the previously-selected feature sets invalid. Very limited research have been done considering both stability (or robustness) and defect prediction model performance together in the software engineering domain, despite the importance of both aspects when choosing a feature selection technique. In this paper, we test the stability and classification model performance of eighteen feature selection techniques as the magnitude of change to the datasets and the size of the selected feature subsets are varied. All experiments were conducted on sixteen datasets from three real-world software projects. The experimental results demonstrate that Gain Ratio shows the least stability while two different versions of ReliefF show the most stability, followed by the PRC- and AUC-based threshold-based feature selection techniques. Results also show that the signal-to-noise ranker performed moderately in terms of robustness and was the best ranker in terms of model performance. Finally, we conclude that while for some rankers, stability and classification performance are correlated, this is not true for other rankers, and therefore performance according to one scheme (stability or model performance) cannot be used to predict performance according to the other.
World Scientific
以上显示的是最相近的搜索结果。 查看全部搜索结果