Conway–Bromage–Lyndon (CBL): an exact, dynamic representation of k-mer sets

I Martayan, B Cazaux, A Limasset, C Marchet - Bioinformatics, 2024 - academic.oup.com
In this article, we introduce the Conway–Bromage–Lyndon (CBL) structure, a compressed,
dynamic and exact method for representing k-mer sets. Originating from Conway and …

Fulgor: a fast and compact k-mer index for large-scale matching and color queries

J Fan, J Khan, NP Singh, GE Pibiri, R Patro - Algorithms for Molecular …, 2024 - Springer
The problem of sequence identification or matching—determining the subset of reference
sequences from a given collection that are likely to contain a short, queried nucleotide …

Engineering compact data structures for rank and select queries on bit vectors

F Kurpicz - International Symposium on String Processing and …, 2022 - Springer
Bit vectors are fundamental building blocks of succinct data structures used in compressed
text indices, eg, in the form of the wavelet trees. Here, two types of queries are of interest …

Locality-preserving minimal perfect hashing of k-mers

GE Pibiri, Y Shibuya, A Limasset - Bioinformatics, 2023 - academic.oup.com
Motivation Minimal perfect hashing is the problem of mapping a static set of n distinct keys
into the address space {1,…, n} bijectively. It is well-known that n log 2 (e) bits are necessary …

Prefix filter: Practically and theoretically better than bloom

T Even, G Even, A Morrison - arXiv preprint arXiv:2203.17139, 2022 - arxiv.org
Many applications of approximate membership query data structures, or filters, require only
an incremental filter that supports insertions but not deletions. However, the design space of …

[PDF][PDF] Faster wavelet trees with quad vectors

M Ceregini, F Kurpicz, R Venturini - CoRR, abs/2302.09239, 2023 - kurpicz.org
Given a text, rank and select queries return the number of occurrences of a character up to a
position (rank) or the position of a character with a given rank (select). These queries have …

Faster Block Tree Construction

D Köppl, F Kurpicz, D Meyer - 31st Annual European Symposium …, 2023 - drops.dagstuhl.de
The block tree [Belazzougui et al. J. Comput. Syst. Sci.'21] is a compressed text index that
can answer access (extract a character at a position), rank (number of occurrences of a …

Faster Wavelet Tree Queries

M Ceregini, F Kurpicz… - 2024 Data Compression …, 2024 - ieeexplore.ieee.org
Given a text, rank and select queries return the number of occurrences of a character up to a
position (rank) or the position of a character with a given rank (select). These queries have …

Efficient Integer Retrieving from Unordered Compressed Sequences

IO Zavadskyi - arXiv preprint arXiv:2302.05869, 2023 - arxiv.org
The variable-length Reverse Multi-Delimiter (RMD) codes are known to represent
sequences of unbounded and unordered integers. When applied to data compression, they …

Ext-LOUDS: A Space Efficient Extended LOUDS Index for Superset Query

L Jia, Y Zhang, J Ding, J You, Y Chen, R Li - Applied Sciences, 2020 - mdpi.com
Superset query is widely used in object-oriented databases, data mining, and many other
fields. Trie is an efficient index for superset query, whereas most existing trie index aim at …