Conway–Bromage–Lyndon (CBL): an exact, dynamic representation of k-mer sets
In this article, we introduce the Conway–Bromage–Lyndon (CBL) structure, a compressed,
dynamic and exact method for representing k-mer sets. Originating from Conway and …
dynamic and exact method for representing k-mer sets. Originating from Conway and …
Fulgor: a fast and compact k-mer index for large-scale matching and color queries
The problem of sequence identification or matching—determining the subset of reference
sequences from a given collection that are likely to contain a short, queried nucleotide …
sequences from a given collection that are likely to contain a short, queried nucleotide …
Engineering compact data structures for rank and select queries on bit vectors
F Kurpicz - International Symposium on String Processing and …, 2022 - Springer
Bit vectors are fundamental building blocks of succinct data structures used in compressed
text indices, eg, in the form of the wavelet trees. Here, two types of queries are of interest …
text indices, eg, in the form of the wavelet trees. Here, two types of queries are of interest …
Locality-preserving minimal perfect hashing of k-mers
Motivation Minimal perfect hashing is the problem of mapping a static set of n distinct keys
into the address space {1,…, n} bijectively. It is well-known that n log 2 (e) bits are necessary …
into the address space {1,…, n} bijectively. It is well-known that n log 2 (e) bits are necessary …
Prefix filter: Practically and theoretically better than bloom
T Even, G Even, A Morrison - arXiv preprint arXiv:2203.17139, 2022 - arxiv.org
Many applications of approximate membership query data structures, or filters, require only
an incremental filter that supports insertions but not deletions. However, the design space of …
an incremental filter that supports insertions but not deletions. However, the design space of …
[PDF][PDF] Faster wavelet trees with quad vectors
M Ceregini, F Kurpicz, R Venturini - CoRR, abs/2302.09239, 2023 - kurpicz.org
Given a text, rank and select queries return the number of occurrences of a character up to a
position (rank) or the position of a character with a given rank (select). These queries have …
position (rank) or the position of a character with a given rank (select). These queries have …
Faster Block Tree Construction
The block tree [Belazzougui et al. J. Comput. Syst. Sci.'21] is a compressed text index that
can answer access (extract a character at a position), rank (number of occurrences of a …
can answer access (extract a character at a position), rank (number of occurrences of a …
Faster Wavelet Tree Queries
M Ceregini, F Kurpicz… - 2024 Data Compression …, 2024 - ieeexplore.ieee.org
Given a text, rank and select queries return the number of occurrences of a character up to a
position (rank) or the position of a character with a given rank (select). These queries have …
position (rank) or the position of a character with a given rank (select). These queries have …
Efficient Integer Retrieving from Unordered Compressed Sequences
IO Zavadskyi - arXiv preprint arXiv:2302.05869, 2023 - arxiv.org
The variable-length Reverse Multi-Delimiter (RMD) codes are known to represent
sequences of unbounded and unordered integers. When applied to data compression, they …
sequences of unbounded and unordered integers. When applied to data compression, they …
Ext-LOUDS: A Space Efficient Extended LOUDS Index for Superset Query
L Jia, Y Zhang, J Ding, J You, Y Chen, R Li - Applied Sciences, 2020 - mdpi.com
Superset query is widely used in object-oriented databases, data mining, and many other
fields. Trie is an efficient index for superset query, whereas most existing trie index aim at …
fields. Trie is an efficient index for superset query, whereas most existing trie index aim at …