Processing large data sets has become an essential part of computational genomics. Greatly increased availability of sequence data from multiple sources has fueled …
DA Yoo, A Rhie, P Hebbar, F Antonacci… - BioRxiv, 2024 - pmc.ncbi.nlm.nih.gov
We present haplotype-resolved reference genomes and comparative analyses of six ape species, namely: chimpanzee, bonobo, gorilla, Bornean orangutan, Sumatran orangutan …
Motivation Given a string S, a minimizer scheme is an algorithm defined by a triple (k, w, 𝒪) that samples a subset of k-mers (k-long substrings) from a string S. Specifically, it samples …
Motivation Sampling k-mers is a ubiquitous task in sequence analysis algorithms. Sampling schemes such as the often-used random minimizer scheme are particularly appealing as …
Motivation A common method for analyzing genomic repeats is to produce a sequence similarity matrix visualized via a dot plot. Innovative approaches such as StainedGlass have …
Streptococcus pneumoniae is an opportunistic human pathogen responsible for high morbidity and mortality rates. Extensive genome sequencing revealed its large pangenome …
Whole genome duplication (WGD) is a powerful evolutionary mechanism in plants. Autopolyploids have been comparatively less studied than allopolyploids, with sexual …
H Li, R Durbin - Nature Reviews Genetics, 2024 - nature.com
Genome sequences largely determine the biology and encode the history of an organism, and de novo assembly—the process of reconstructing the genome sequence of an organism …
Sampling algorithms that deterministically select a subset of k-mers are an important building block in bioinformatics applications. For example, they are used to index large …