SALAI-Net: species-agnostic local ancestry inference network

B Oriol Sabat, D Mas Montserrat, X Giro-i-Nieto… - …, 2022 - academic.oup.com
Bioinformatics, 2022academic.oup.com
Motivation Local ancestry inference (LAI) is the high resolution prediction of ancestry labels
along a DNA sequence. LAI is important in the study of human history and migrations, and it
is beginning to play a role in precision medicine applications including ancestry-adjusted
genome-wide association studies (GWASs) and polygenic risk scores (PRSs). Existing LAI
models do not generalize well between species, chromosomes or even ancestry groups,
requiring re-training for each different setting. Furthermore, such methods can lack …
Motivation
Local ancestry inference (LAI) is the high resolution prediction of ancestry labels along a DNA sequence. LAI is important in the study of human history and migrations, and it is beginning to play a role in precision medicine applications including ancestry-adjusted genome-wide association studies (GWASs) and polygenic risk scores (PRSs). Existing LAI models do not generalize well between species, chromosomes or even ancestry groups, requiring re-training for each different setting. Furthermore, such methods can lack interpretability, which is an important element in each of these applications.
Results
We present SALAI-Net, a portable statistical LAI method that can be applied on any set of species and ancestries (species-agnostic), requiring only haplotype data and no other biological parameters. Inspired by identity by descent methods, SALAI-Net estimates population labels for each segment of DNA by performing a reference matching approach, which leads to an interpretable and fast technique. We benchmark our models on whole-genome data of humans and we test these models’ ability to generalize to dog breeds when trained on human data. SALAI-Net outperforms previous methods in terms of balanced accuracy, while generalizing between different settings, species and datasets. Moreover, it is up to two orders of magnitude faster and uses considerably less RAM memory than competing methods.
Availability and implementation
We provide an open source implementation and links to publicly available data at github.com/AI-sandbox/SALAI-Net. Data is publicly available as follows: https://www.internationalgenome.org (1000 Genomes), https://www.simonsfoundation.org/simons-genome-diversity-project (Simons Genome Diversity Project), https://www.sanger.ac.uk/resources/downloads/human/hapmap3.html (HapMap), ftp://ngs.sanger.ac.uk/production/hgdp/hgdp_wgs.20190516 (Human Genome Diversity Project) and https://www.ncbi.nlm.nih.gov/bioproject/PRJNA448733 (Canid genomes).
Supplementary information
Supplementary data are available from Bioinformatics online.
Oxford University Press
以上显示的是最相近的搜索结果。 查看全部搜索结果