Stability analysis of feature ranking techniques on biological datasets

D Dittman, TM Khoshgoftaar, R Wald… - … on Bioinformatics and …, 2011 - ieeexplore.ieee.org
2011 IEEE International Conference on Bioinformatics and Biomedicine, 2011ieeexplore.ieee.org
One major problem faced when analyzing DNA microarrays is their high dimensionality
(large number of features). Therefore, feature selection is a necessary step when using
these datasets. However, the addition or removal of instances can alter the subsets chosen
by a feature selection technique. The ideal situation is to choose a feature selection
technique that is robust (stable) to changes in the number of instances, with selected
features changing little even when instances are added or removed. In this study we test the …
One major problem faced when analyzing DNA microarrays is their high dimensionality (large number of features). Therefore, feature selection is a necessary step when using these datasets. However, the addition or removal of instances can alter the subsets chosen by a feature selection technique. The ideal situation is to choose a feature selection technique that is robust (stable) to changes in the number of instances, with selected features changing little even when instances are added or removed. In this study we test the stability of nineteen feature selection techniques across twenty- six datasets with varying levels of class imbalance. Our results show that the best choice of technique depends on the class balance of the datasets. The top performers are Deviance for balanced datasets, Signal to Noise for slightly unbalanced datasets, and AUC for unbalanced datasets. SVM-RFE was the least stable feature selection technique across the board, while other poor performers include Gain Ratio, Gini Index, Probability Ratio, and Power. We also found that enough changes to the dataset can make any feature selection technique unstable, and that using more features increases the stability of most feature selection techniques. Most intriguing was our finding that the more imbalanced a dataset is, the more stable the feature subsets built for that dataset will be. Overall, we conclude that stability is an important aspect of feature ranking which must be taken into account when planning a feature selection strategy or when adding or removing instances from a dataset.
ieeexplore.ieee.org
以上显示的是最相近的搜索结果。 查看全部搜索结果