Benchmarking of alignment-free sequence comparison methods

A Zielezinski, HZ Girgis, G Bernard, CA Leimeister… - Genome biology, 2019 - Springer
Background Alignment-free (AF) sequence comparison is attracting persistent interest driven
by data-intensive applications. Hence, many AF procedures have been proposed in recent …

Beyond DNA barcoding: The unrealized potential of genome skim data in sample identification

K Bohmann, S Mirarab, V Bafna, MTP Gilbert - 2020 - Wiley Online Library
Genetic tools are increasingly used to identify and discriminate between species. One key
transition in this process was the recognition of the potential of the ca 658bp fragment of the …

CONSULT-II: accurate taxonomic identification and profiling using locality-sensitive hashing

AOB Şapcı, E Rachtman, S Mirarab - Bioinformatics, 2024 - academic.oup.com
Motivation Taxonomic classification of short reads and taxonomic profiling of metagenomic
samples are well-studied yet challenging problems. The presence of species belonging to …

SHARK enables sensitive detection of evolutionary homologs and functional analogs in unalignable and disordered sequences

CFW Chow, S Ghosh, A Hadarovich… - Proceedings of the …, 2024 - pnas.org
Intrinsically disordered regions (IDRs) are structurally flexible protein segments with
regulatory functions in multiple contexts, such as in the assembly of biomolecular …

Success of alignment-free oligonucleotide (k-mer) analysis confirms relative importance of genomes not genes in speciation and phylogeny

DR Forsdyke - Biological Journal of the Linnean Society, 2019 - academic.oup.com
The utility of DNA sequence substrings (k-mers) in alignment-free phylogenetic
classification, including that of bacteria and viruses, is increasingly recognized. However, its …

Efficient DNA sequence compression with neural networks

M Silva, D Pratas, AJ Pinho - GigaScience, 2020 - academic.oup.com
Background The increasing production of genomic data has led to an intensified need for
models that can cope efficiently with the lossless compression of DNA sequences. Important …

The number of k-mer matches between two DNA sequences as a function of k and applications to estimate phylogenetic distances

S Röhling, A Linne, J Schellhorn, M Hosseini… - Plos one, 2020 - journals.plos.org
We study the number N k of length-k word matches between pairs of evolutionarily related
DNA sequences, as a function of k. We show that the Jukes-Cantor distance between two …

Quantifying the uncertainty of assembly-free genome-wide distance estimates and phylogenetic relationships using subsampling

E Rachtman, S Sarmashghi, V Bafna, S Mirarab - Cell systems, 2022 - cell.com
Computing distance between two genomes without alignments or even access to
assemblies has many downstream analyses. However, alignment-free methods, including in …

CONSULT: accurate contamination removal using locality-sensitive hashing

E Rachtman, V Bafna, S Mirarab - NAR Genomics and …, 2021 - academic.oup.com
A fundamental question appears in many bioinformatics applications: Does a sequencing
read belong to a large dataset of genomes from some broad taxonomic group, even when …

High-Throughput Genomic Data Reveal Complex Phylogenetic Relationships in Stylosanthes Sw (Leguminosae)

MAS Oliveira, T Nunes, MA Dos Santos… - Frontiers in …, 2021 - frontiersin.org
Allopolyploidy is widely present across plant lineages. Though estimating the correct
phylogenetic relationships and origin of allopolyploids may sometimes become a hard task …