作者
Jouni Sirén, Erik Garrison, Adam M Novak, Benedict Paten, Richard Durbin
发表日期
2018/5/10
期刊
arXiv preprint arXiv:1805.03834
简介
Motivation
The variation graph toolkit (VG) represents genetic variation as a graph. Although each path in the graph is a potential haplotype, most paths are non-biological, unlikely recombinations of true haplotypes.
Results
We augment the VG model with haplotype information to identify which paths are more likely to exist in nature. For this purpose, we develop a scalable implementation of the graph extension of the positional Burrows–Wheeler transform. We demonstrate the scalability of the new implementation by building a whole-genome index of the 5008 haplotypes of the 1000 Genomes Project, and an index of all 108 070 Trans-Omics for Precision Medicine Freeze 5 chromosome 17 haplotypes. We also develop an algorithm for simplifying variation graphs for k-mer indexing without losing any k-mers in the haplotypes.
Availability and implementation …
引用总数
2019202020212022202320249152215177
学术搜索中的文章
J Sirén, E Garrison, AM Novak, B Paten, R Durbin - Bioinformatics, 2020