Data structures based on k-mers for querying large collections of sequencing data sets

C Marchet, C Boucher, SJ Puglisi, P Medvedev… - Genome …, 2021 - genome.cshlp.org
High-throughput sequencing data sets are usually deposited in public repositories (eg, the
European Nucleotide Archive) to ensure reproducibility. As the amount of data has reached …

Survey and taxonomy of lossless graph compression and space-efficient graph representations

M Besta, T Hoefler - arXiv preprint arXiv:1806.01799, 2018 - arxiv.org
Various graphs such as web or social networks may contain up to trillions of edges.
Compressing such datasets can accelerate graph processing by reducing the amount of I/O …

The design and construction of reference pangenome graphs with minigraph

H Li, X Feng, C Chu - Genome biology, 2020 - Springer
The recent advances in sequencing technologies enable the assembly of individual
genomes to the quality of the reference genome. How to integrate multiple genomes from …

Bifrost: highly parallel construction and indexing of colored and compacted de Bruijn graphs

G Holley, P Melsted - Genome biology, 2020 - Springer
Memory consumption of de Bruijn graphs is often prohibitive. Most de Bruijn graph-based
assemblers reduce the complexity by compacting paths into single vertices, but this is …

Extremely fast construction and querying of compacted and colored de Bruijn graphs with GGCAT

A Cracco, AI Tomescu - Genome Research, 2023 - genome.cshlp.org
Compacted de Bruijn graphs are one of the most fundamental data structures in
computational genomics. Colored compacted de Bruijn graphs are a variant built on a …

Mantis: a fast, small, and exact large-scale sequence-search index

P Pandey, F Almodaresi, MA Bender, M Ferdman… - Cell systems, 2018 - cell.com
Sequence-level searches on large collections of RNA sequencing experiments, such as the
NCBI Sequence Read Archive (SRA), would enable one to ask many questions about the …

COBS: a compact bit-sliced signature index

T Bingmann, P Bradley, F Gauger, Z Iqbal - String Processing and …, 2019 - Springer
We present COBS, a COmpact Bit-sliced Signature index, which is a cross-over between an
inverted index and Bloom filters. Our target application is to index k-mers of DNA samples or …

Metagraph: Indexing and analysing nucleotide archives at petabase-scale

M Karasikov, H Mustafa, D Danciu, C Barber… - BioRxiv, 2020 - biorxiv.org
The amount of biological sequencing data available in public repositories is growing
exponentially, forming an invaluable biomedical research resource. Yet, making all this …

A space and time-efficient index for the compacted colored de Bruijn graph

F Almodaresi, H Sarkar, A Srivastava, R Patro - Bioinformatics, 2018 - academic.oup.com
Motivation Indexing reference sequences for search—both individual genomes and
collections of genomes—is an important building block for many sequence analysis tasks …

Data Structures to Represent a Set of k-long DNA Sequences

R Chikhi, J Holub, P Medvedev - ACM Computing Surveys (CSUR), 2021 - dl.acm.org
The analysis of biological sequencing data has been one of the biggest applications of
string algorithms. The approaches used in many such applications are based on the …