M Besta, T Hoefler - arXiv preprint arXiv:1806.01799, 2018 - arxiv.org
Various graphs such as web or social networks may contain up to trillions of edges. Compressing such datasets can accelerate graph processing by reducing the amount of I/O …
The recent advances in sequencing technologies enable the assembly of individual genomes to the quality of the reference genome. How to integrate multiple genomes from …
Memory consumption of de Bruijn graphs is often prohibitive. Most de Bruijn graph-based assemblers reduce the complexity by compacting paths into single vertices, but this is …
Compacted de Bruijn graphs are one of the most fundamental data structures in computational genomics. Colored compacted de Bruijn graphs are a variant built on a …
Sequence-level searches on large collections of RNA sequencing experiments, such as the NCBI Sequence Read Archive (SRA), would enable one to ask many questions about the …
We present COBS, a COmpact Bit-sliced Signature index, which is a cross-over between an inverted index and Bloom filters. Our target application is to index k-mers of DNA samples or …
The amount of biological sequencing data available in public repositories is growing exponentially, forming an invaluable biomedical research resource. Yet, making all this …
Motivation Indexing reference sequences for search—both individual genomes and collections of genomes—is an important building block for many sequence analysis tasks …
R Chikhi, J Holub, P Medvedev - ACM Computing Surveys (CSUR), 2021 - dl.acm.org
The analysis of biological sequencing data has been one of the biggest applications of string algorithms. The approaches used in many such applications are based on the …