Indexing highly repetitive string collections, part II: Compressed indexes

G Navarro - ACM Computing Surveys (CSUR), 2021 - dl.acm.org
Two decades ago, a breakthrough in indexing string collections made it possible to
represent them within their compressed space while at the same time offering indexed …

Computational graph pangenomics: a tutorial on data structures and their applications

JA Baaijens, P Bonizzoni, C Boucher… - Natural Computing, 2022 - Springer
Computational pangenomics is an emerging research field that is changing the way
computer scientists are facing challenges in biological sequence analysis. In past decades …

MONI: a pangenomic index for finding maximal exact matches

M Rossi, M Oliva, B Langmead, T Gagie… - Journal of …, 2022 - liebertpub.com
Recently, Gagie et al. proposed a version of the FM-index, called the r-index, that can store
thousands of human genomes on a commodity computer. Then Kuhnle et al. showed how to …

SPUMONI 2: improved classification using a pangenome index of minimizer digests

OY Ahmed, M Rossi, T Gagie, C Boucher, B Langmead - Genome Biology, 2023 - Springer
Genomics analyses use large reference sequence collections, like pangenomes or
taxonomic databases. SPUMONI 2 is an efficient tool for sequence classification of both …

Resolution of the burrows-wheeler transform conjecture

D Kempa, T Kociumaka - Communications of the ACM, 2022 - dl.acm.org
Abstract The Burrows-Wheeler Transform (BWT) is an invertible text transformation that
permutes symbols of a text according to the lexicographical order of its suffixes. BWT is the …

An upper bound and linear-space queries on the LZ-End parsing

D Kempa, B Saha - Proceedings of the 2022 Annual ACM-SIAM …, 2022 - SIAM
Lempel–Ziv (LZ77) compression is the most commonly used lossless compression
algorithm. The basic idea is to greedily break the input string into blocks (called “phrases”) …

Computing MEMs and Relatives on Repetitive Text Collections

G Navarro - arXiv preprint arXiv:2210.09914, 2022 - arxiv.org
We consider the problem of computing the Maximal Exact Matches (MEMs) of a given
pattern $ P [1.. m] $ on a large repetitive text collection $ T [1.. n] $, which is represented as a …

[HTML][HTML] Sensitivity of string compressors and repetitiveness measures

T Akagi, M Funakoshi, S Inenaga - Information and Computation, 2023 - Elsevier
The sensitivity of a string compression algorithm C asks how much the output size C (T) for
an input string T can increase when a single character edit operation is performed on T. This …

Optimal-time queries on BWT-runs compressed indexes

T Nishimoto, Y Tabei - arXiv preprint arXiv:2006.05104, 2020 - arxiv.org
Indexing highly repetitive strings (ie, strings with many repetitions) for fast queries has
become a central research topic in string processing, because it has a wide variety of …

A survey of BWT variants for string collections

D Cenzato, Z Lipták - arXiv preprint arXiv:2202.13235, 2022 - arxiv.org
In recent years, the focus of bioinformatics research has moved from individual sequences to
collections of sequences. Given the fundamental role of the Burrows-Wheeler Transform …