High speed BLASTN: an accelerated MegaBLAST search tool

Y Chen, W Ye, Y Zhang, Y Xu - Nucleic acids research, 2015 - academic.oup.com
Sequence alignment is a long standing problem in bioinformatics. The Basic Local
Alignment Search Tool (BLAST) is one of the most popular and fundamental alignment tools …

Efficient de novo assembly of large genomes using compressed data structures

JT Simpson, R Durbin - Genome research, 2012 - genome.cshlp.org
De novo genome sequence assembly is important both to generate new sequence
assemblies for previously uncharacterized genomes and to identify the genome sequence of …

Exploring single-sample SNP and INDEL calling with whole-genome de novo assembly

H Li - Bioinformatics, 2012 - academic.oup.com
Abstract Motivation: Eugene Myers in his string graph paper suggested that in a string graph
or equivalently a unitig graph, any path spells a valid assembly. As a string/unitig graph also …

Efficient construction of an assembly string graph using the FM-index

JT Simpson, R Durbin - Bioinformatics, 2010 - academic.oup.com
Motivation: Sequence assembly is a difficult problem whose importance has grown again
recently as the cost of sequencing has dramatically dropped. Most new sequence assembly …

Storage and retrieval of highly repetitive sequence collections

V Mäkinen, G Navarro, J Sirén… - Journal of Computational …, 2010 - liebertpub.com
A repetitive sequence collection is a set of sequences which are small variations of each
other. A prominent example are genome sequences of individuals of the same or close …

Haplotype-aware graph indexes

J Sirén, E Garrison, AM Novak, B Paten… - Bioinformatics, 2020 - academic.oup.com
Motivation The variation graph toolkit (VG) represents genetic variation as a graph. Although
each path in the graph is a potential haplotype, most paths are non-biological, unlikely …

Lightweight algorithms for constructing and inverting the BWT of string collections

MJ Bauer, AJ Cox, G Rosone - Theoretical Computer Science, 2013 - Elsevier
Recent progress in the field of DNA sequencing motivates us to consider the problem of
computing the Burrows–Wheeler transform (BWT) of a collection of strings. A human …

Lightweight data indexing and compression in external memory

P Ferragina, T Gagie, G Manzini - Algorithmica, 2012 - Springer
In this paper we describe algorithms for computing the Burrows-Wheeler Transform (bwt)
and for building (compressed) indexes in external memory. The innovative feature of our …

Prospects and limitations of full-text index structures in genome analysis

M Vyverman, B De Baets, V Fack… - Nucleic acids …, 2012 - academic.oup.com
The combination of incessant advances in sequencing technology producing large amounts
of data and innovative bioinformatics approaches, designed to cope with this data flood, has …

DREAM-Yara: an exact read mapper for very large databases with short update time

TH Dadi, E Siragusa, VC Piro, A Andrusch… - …, 2018 - academic.oup.com
Motivation Mapping-based approaches have become limited in their application to very
large sets of references since computing an FM-index for very large databases (eg> 10 GB) …