Survey and taxonomy of lossless graph compression and space-efficient graph representations

M Besta, T Hoefler - arXiv preprint arXiv:1806.01799, 2018 - arxiv.org
Various graphs such as web or social networks may contain up to trillions of edges.
Compressing such datasets can accelerate graph processing by reducing the amount of I/O …

Data Structures to Represent a Set of k-long DNA Sequences

R Chikhi, J Holub, P Medvedev - ACM Computing Surveys (CSUR), 2021 - dl.acm.org
The analysis of biological sequencing data has been one of the biggest applications of
string algorithms. The approaches used in many such applications are based on the …

Succinct de Bruijn graphs

A Bowe, T Onodera, K Sadakane, T Shibuya - International workshop on …, 2012 - Springer
We propose a new succinct de Bruijn graph representation. If the de Bruijn graph of k-mers
in a DNA sequence of length N has m edges, it can be represented in 4 m+ o (m) bits. This is …

Mantis: a fast, small, and exact large-scale sequence-search index

P Pandey, F Almodaresi, MA Bender, M Ferdman… - Cell systems, 2018 - cell.com
Sequence-level searches on large collections of RNA sequencing experiments, such as the
NCBI Sequence Read Archive (SRA), would enable one to ask many questions about the …

Representation of k-Mer Sets Using Spectrum-Preserving String Sets

A Rahman, P Medevedev - Journal of Computational Biology, 2021 - liebertpub.com
Given the popularity and elegance of k-mer-based tools, finding a space-efficient way to
represent a set of k-mers is important for improving the scalability of bioinformatics analyses …

A space and time-efficient index for the compacted colored de Bruijn graph

F Almodaresi, H Sarkar, A Srivastava, R Patro - Bioinformatics, 2018 - academic.oup.com
Motivation Indexing reference sequences for search—both individual genomes and
collections of genomes—is an important building block for many sequence analysis tasks …

deBGR: an efficient and near-exact representation of the weighted de Bruijn graph

P Pandey, MA Bender, R Johnson, R Patro - Bioinformatics, 2017 - academic.oup.com
Motivation Almost all de novo short-read genome and transcriptome assemblers start by
building a representation of the de Bruijn Graph of the reads they are given as input. Even …

Practical dynamic de Bruijn graphs

VG Crawford, A Kuhnle, C Boucher, R Chikhi… - …, 2018 - academic.oup.com
Abstract Motivation The de Bruijn graph is fundamental to the analysis of next generation
sequencing data and so, as datasets of DNA reads grow rapidly, it becomes more important …

deGSM: memory scalable construction of large scale de Bruijn graph

H Guo, Y Fu, Y Gao, J Li, Y Wang… - IEEE/ACM transactions …, 2019 - ieeexplore.ieee.org
The de Bruijn graph, a fundamental data structure to represent and organize genome
sequence, plays important roles in various kinds of sequence analysis tasks. With the rapid …

Tight bounds for monotone minimal perfect hashing

S Assadi, M Farach-Colton, W Kuszmaul - … of the 2023 Annual ACM-SIAM …, 2023 - SIAM
The monotone minimal perfect hash function (MMPHF) problem is the following indexing
problem. Given a set S={s ı,…, sn} of n distinct keys from a universe U of size u, create a data …