作者
Yunming Ye, Qingyao Wu, Joshua Zhexue Huang, Michael K Ng, Xutao Li
发表日期
2013/3/1
期刊
Pattern Recognition
卷号
46
期号
3
页码范围
769-787
出版商
Pergamon
简介
For high dimensional data a large portion of features are often not informative of the class of the objects. Random forest algorithms tend to use a simple random sampling of features in building their decision trees and consequently select many subspaces that contain few, if any, informative features. In this paper we propose a stratified sampling method to select the feature subspaces for random forests with high dimensional data. The key idea is to stratify features into two groups. One group will contain strong informative features and the other weak informative features. Then, for feature subspace selection, we randomly select features from each group proportionally. The advantage of stratified sampling is that we can ensure that each subspace contains enough informative features for classification in high dimensional data. Testing on both synthetic data and various real data sets in gene classification, image …
引用总数
2013201420152016201720182019202020212022202320244132125122222191918234