作者
Peter Langfelder, Steve Horvath
发表日期
2012/3/7
期刊
Journal of statistical software
卷号
46
期号
11
出版商
NIH Public Access
简介
Many high-throughput biological data analyses require the calculation of large correlation matrices and/or clustering of a large number of objects. The standard R function for calculating Pearson correlation can handle calculations without missing values efficiently, but is inefficient when applied to data sets with a relatively small number of missing data. We present an implementation of Pearson correlation calculation that can lead to substantial speedup on data with relatively small number of missing entries. Further, we parallelize all calculations and thus achieve further speedup on systems where parallel processing is available. A robust correlation measure, the biweight midcorrelation, is implemented in a similar manner and provides comparable speed. The functions cor and bicor for fast Pearson and biweight midcorrelation, respectively, are part of the updated, freely available R package WGCNA.
引用总数
2012201320142015201620172018201920202021202220232024314252357618010810816418517376
学术搜索中的文章