Weighted minimizer sampling improves long read mapping

C Jain, A Rhie, H Zhang, C Chu, BP Walenz… - …, 2020 - academic.oup.com
Motivation In this era of exponential data growth, minimizer sampling has become a
standard algorithmic technique for rapid genome sequence comparison. This technique …

Creating and using minimizer sketches in computational genomics

H Zheng, G Marçais, C Kingsford - Journal of Computational …, 2023 - liebertpub.com
Processing large data sets has become an essential part of computational genomics.
Greatly increased availability of sequence data from multiple sources has fueled …

Theory of local k-mer selection with applications to long-read alignment

J Shaw, YW Yu - Bioinformatics, 2022 - academic.oup.com
Motivation Selecting a subset of k-mers in a string in a local manner is a common task in
bioinformatics tools for speeding up computation. Arguably the most well-known and …

BLEND: a fast, memory-efficient and accurate mechanism to find fuzzy seed matches in genome analysis

C Firtina, J Park, M Alser, JS Kim… - NAR Genomics and …, 2023 - academic.oup.com
Generating the hash values of short subsequences, called seeds, enables quickly
identifying similarities between genomic sequences by matching seeds with a single lookup …

Improved design and analysis of practical minimizers

H Zheng, C Kingsford, G Marçais - Bioinformatics, 2020 - academic.oup.com
Motivation Minimizers are methods to sample k-mers from a string, with the guarantee that
similar set of k-mers will be chosen on similar strings. It is parameterized by the k-mer length …

A near-tight lower bound on the density of forward sampling schemes

B Kille, R Groot Koerkamp, D McAdams, A Liu… - …, 2025 - academic.oup.com
Motivation Sampling k-mers is a ubiquitous task in sequence analysis algorithms. Sampling
schemes such as the often-used random minimizer scheme are particularly appealing as …

Efficient minimizer orders for large values of k using minimum decycling sets

D Pellow, L Pu, B Ekim, L Kotlar, B Berger… - Genome …, 2023 - genome.cshlp.org
Minimizers are ubiquitously used in data structures and algorithms for efficient searching,
mapping, and indexing of high-throughput DNA sequencing data. Minimizer schemes select …

The minimizer Jaccard estimator is biased and inconsistent

M Belbasi, A Blanca, RS Harris, D Koslicki… - …, 2022 - academic.oup.com
Motivation Sketching is now widely used in bioinformatics to reduce data size and increase
data processing speed. Sketching approaches entice with improved scalability but also carry …

LexicHash: sequence similarity estimation via lexicographic comparison of hashes

G Greenberg, AN Ravi, I Shomorony - Bioinformatics, 2023 - academic.oup.com
Motivation Pairwise sequence alignment is a heavy computational burden, particularly in the
context of third-generation sequencing technologies. This issue is commonly addressed by …

A randomized parallel algorithm for efficiently finding near-optimal universal hitting sets

B Ekim, B Berger, Y Orenstein - International Conference on Research in …, 2020 - Springer
As the volume of next generation sequencing data increases, an urgent need for algorithms
to efficiently process the data arises. Universal hitting sets (UHS) were recently introduced …