作者
Madeline Park, Sevahn Vorperian, Sheng Wang, Angela Oliveira Pisco
发表日期
2020/1/1
期刊
bioRxiv
出版商
Cold Spring Harbor Laboratory
简介
Single cell RNA sequencing (scRNA-seq) enables detailed examination of a cell’s underlying regulatory networks and the molecular factors contributing to its identity. We developed scRFE (single-cell identity definition using random forests and recursive feature elimination, pronounced ‘surf’) with the goal of easily generating interpretable gene lists that can accurately distinguish observations (single-cells) by their features (genes) given a class of interest. scRFE is an algorithm implemented as a Python package that combines the classical random forest method with recursive feature elimination and cross validation to find the features necessary and sufficient to classify cells in a single-cell RNA-seq dataset by ranking feature importance. The package is compatible with Scanpy, enabling a seamless integration into any single-cell data analysis workflow that aims at identifying minimal transcriptional programs relevant to describing metadata features of the dataset. We applied scRFE to the Tabula Muris Senis and reproduced commonly known aging patterns and transcription factor reprogramming protocols, highlighting the biological value of scRFE’s learned features.
Author summary
scRFE is a Python package that combines the classical random forest algorithm with recursive feature elimination and cross validation to find the features necessary and sufficient to classify cells in a single-cell RNA-seq dataset by ranking feature importance. scRFE was designed to enable straightforward integration as a part of any single-cell data analysis workflow that aims at identifying minimal transcriptional programs relevant to describing metadata features of …
引用总数