Benchmarking algorithms for single-cell multi-omics prediction and integration

Y Hu, S Wan, Y Luo, Y Li, T Wu, W Deng, C Jiang… - Nature …, 2024 - nature.com
The development of single-cell multi-omics technology has greatly enhanced our
understanding of biology, and in parallel, numerous algorithms have been proposed to …

DeepGAMI: deep biologically guided auxiliary learning for multimodal integration and imputation to improve genotype–phenotype prediction

PB Chandrashekar, S Alatkar, J Wang, GE Hoffman… - Genome Medicine, 2023 - Springer
Background Genotypes are strongly associated with disease phenotypes, particularly in
brain disorders. However, the molecular and cellular mechanisms behind this association …

Mosaic integration and knowledge transfer of single-cell multimodal data with MIDAS

Z He, S Hu, Y Chen, S An, J Zhou, R Liu, J Shi… - Nature …, 2024 - nature.com
Integrating single-cell datasets produced by multiple omics technologies is essential for
defining cellular heterogeneity. Mosaic integration, in which different datasets share only …

Synthetic augmentation of cancer cell line multi-omic datasets using unsupervised deep learning

Z Cai, S Apolinário, AR Baião, C Pacini… - Nature …, 2024 - nature.com
Integrating diverse types of biological data is essential for a holistic understanding of cancer
biology, yet it remains challenging due to data heterogeneity, complexity, and sparsity …

Subsample ridge ensembles: Equivalences and generalized cross-validation

JH Du, P Patil, AK Kuchibhotla - arXiv preprint arXiv:2304.13016, 2023 - arxiv.org
We study subsampling-based ridge ensembles in the proportional asymptotics regime,
where the feature size grows proportionally with the sample size such that their ratio …

scEpiTools: a database to comprehensively interrogate analytic tools for single-cell epigenomic data

Z Gao, X Chen, Z Li, X Cui, S Chen, R Jiang - bioRxiv, 2023 - biorxiv.org
Single-cell sequencing technology has enabled the characterization of cellular
heterogeneity at an unprecedented resolution. To analyze single-cell RNA-sequencing data …

Deep Learning in Single-Cell and Spatial Transcriptomics Data Analysis: Advances and Challenges from a Data Science Perspective

S Ge, S Sun, H Xu, Q Cheng, Z Ren - arXiv preprint arXiv:2412.03614, 2024 - arxiv.org
The development of single-cell and spatial transcriptomics has revolutionized our capacity to
investigate cellular properties, functions, and interactions in both cellular and spatial …

High-dimensional logistic regression with missing data: Imputation, regularization, and universality

KA Verchand, A Montanari - arXiv preprint arXiv:2410.01093, 2024 - arxiv.org
We study high-dimensional, ridge-regularized logistic regression in a setting in which the
covariates may be missing or corrupted by additive noise. When both the covariates and the …

Cancer molecular subtyping using limited multi-omics data with missingness

Y Bu, J Liang, Z Li, J Wang, J Wang… - PLOS Computational …, 2024 - journals.plos.org
Diagnosing cancer subtypes is a prerequisite for precise treatment. Existing multi-omics data
fusion-based diagnostic solutions build on the requisite of sufficient samples with complete …

Extrapolated cross-validation for randomized ensembles

JH Du, P Patil, K Roeder… - Journal of Computational …, 2024 - Taylor & Francis
Ensemble methods such as bagging and random forests are ubiquitous in various fields,
from finance to genomics. Despite their prevalence, the question of the efficient tuning of …