Recursive Cluster Elimination (RCE) for classification and feature selection from gene expression...- 学术资源搜索

Recursive Cluster Elimination (RCE) for classification and feature selection from gene expression data

M Yousef, S Jung, LC Showe, MK Showe - BMC bioinformatics, 2007 - Springer

BMC bioinformatics, 2007•Springer

Background Classification studies using gene expression datasets are usually based on
small numbers of samples and tens of thousands of genes. The selection of those genes that
are important for distinguishing the different sample classes being compared, poses a
challenging problem in high dimensional data analysis. We describe a new procedure for
selecting significant genes as recursive cluster elimination (RCE) rather than recursive
feature elimination (RFE). We have tested this algorithm on six datasets and compared its …

Background

Classification studies using gene expression datasets are usually based on small numbers of samples and tens of thousands of genes. The selection of those genes that are important for distinguishing the different sample classes being compared, poses a challenging problem in high dimensional data analysis. We describe a new procedure for selecting significant genes as recursive cluster elimination (RCE) rather than recursive feature elimination (RFE). We have tested this algorithm on six datasets and compared its performance with that of two related classification procedures with RFE.

Results

We have developed a novel method for selecting significant genes in comparative gene expression studies. This method, which we refer to as SVM-RCE, combines K-means, a clustering method, to identify correlated gene clusters, and Support Vector Machines (SVMs), a supervised machine learning classification method, to identify and score (rank) those gene clusters for the purpose of classification. K-means is used initially to group genes into clusters. Recursive cluster elimination (RCE) is then applied to iteratively remove those clusters of genes that contribute the least to the classification performance. SVM-RCE identifies the clusters of correlated genes that are most significantly differentially expressed between the sample classes. Utilization of gene clusters, rather than individual genes, enhances the supervised classification accuracy of the same data as compared to the accuracy when either SVM or Penalized Discriminant Analysis (PDA) with recursive feature elimination (SVM-RFE and PDA-RFE) are used to remove genes based on their individual discriminant weights.

Conclusion

SVM-RCE provides improved classification accuracy with complex microarray data sets when it is compared to the classification accuracy of the same datasets using either SVM-RFE or PDA-RFE. SVM-RCE identifies clusters of correlated genes that when considered together provide greater insight into the structure of the microarray data. Clustering genes for classification appears to result in some concomitant clustering of samples into subgroups.

Our present implementation of SVM-RCE groups genes using the correlation metric. The success of the SVM-RCE method in classification suggests that gene interaction networks or other biologically relevant metrics that group genes based on functional parameters might also be useful.

Springer

展开收起

被引用次数：140 相关文章所有 17 个版本

以上显示的是最相近的搜索结果。查看全部搜索结果

高级搜索

QQ 群

Recursive Cluster Elimination (RCE) for classification and feature selection from gene expression data

引用