The Statistics of k-mers from a Sequence Undergoing a Simple Mutation Process Without Spurious Matches

A Blanca, RS Harris, D Koslicki… - Journal of Computational …, 2022 - liebertpub.com
k-mer-based methods are widely used in bioinformatics, but there are many gaps in our
understanding of their statistical properties. Here, we consider the simple model where a …

KINN: an alignment-free accurate phylogeny reconstruction method based on inner distance distributions of k-mer pairs in biological sequences

R Tang, Z Yu, J Li - Molecular Phylogenetics and Evolution, 2023 - Elsevier
Alignment-based methods have faced disadvantages in sequence comparison and
phylogeny reconstruction due to their high computational complexity. Alignment-free …

[HTML][HTML] VirusTaxo: Taxonomic classification of viruses from the genome sequence using k-mer enrichment

RS Raju, A Al Nahid, PC Dev, R Islam - Genomics, 2022 - Elsevier
Classification of viruses into their taxonomic ranks (eg, order, family, and genus) provides a
framework to organize an abundant population of viruses. Next-generation metagenomic …

[HTML][HTML] On the transformation of MinHash-based uncorrected distances into proper evolutionary distances for phylogenetic inference

A Criscuolo - F1000Research, 2020 - ncbi.nlm.nih.gov
Recently developed MinHash-based techniques were proven successful in quickly
estimating the level of similarity between large nucleotide sequences. This article discusses …

[PDF][PDF] Genome assembly composition of the String “ACGT” array: a review of data structure accuracy and performance challenges

SMMA Barakat, R Sallehuddin, SS Yuhaniz… - PeerJ Computer …, 2023 - peerj.com
Background The development of sequencing technology increases the number of genomes
being sequenced. However, obtaining a quality genome sequence remains a challenge in …

Phylonium: fast estimation of evolutionary distances from large samples of similar genomes

F Klötzl, B Haubold - Bioinformatics, 2020 - academic.oup.com
Motivation Tracking disease outbreaks by whole-genome sequencing leads to the collection
of large samples of closely related sequences. Five years ago, we published a method to …

Read-SpaM: assembly-free and alignment-free comparison of bacterial genomes with low sequencing coverage

AK Lau, S Dörrer, CA Leimeister, C Bleidorn… - BMC …, 2019 - Springer
Background In many fields of biomedical research, it is important to estimate phylogenetic
distances between taxa based on low-coverage sequencing reads. Major applications are …

Identity: rapid alignment-free prediction of sequence alignment identity scores using self-supervised general linear models

HZ Girgis, BT James, BB Luczak - NAR genomics and …, 2021 - academic.oup.com
Pairwise global alignment is a fundamental step in sequence analysis. Optimal alignment
algorithms are quadratic—slow especially on long sequences. In many applications that …

'Multi-SpaM': a maximum-likelihood approach to phylogeny reconstruction using multiple spaced-word matches and quartet trees

T Dencker, CA Leimeister, M Gerth… - NAR Genomics and …, 2020 - academic.oup.com
Word-based or 'alignment-free'methods for phylogeny inference have become popular in
recent years. These methods are much faster than traditional, alignment-based approaches …

KCOSS: an ultra-fast k-mer counter for assembled genome analysis

D Tang, Y Li, D Tan, J Fu, Y Tang, J Lin, R Zhao… - …, 2022 - academic.oup.com
Motivation The k-mer frequency in whole genome sequences provides researchers with an
insightful perspective on genomic complexity, comparative genomics, metagenomics and …