Libra: scalable k-mer–based tool for massive all-vs-all metagenome comparisons

I Choi, AJ Ponsero, M Bomhoff, K Youens-Clark… - …, 2019 - academic.oup.com
Background Shotgun metagenomics provides powerful insights into microbial community
biodiversity and function. Yet, inferences from metagenomic studies are often limited by …

The parallelism motifs of genomic data analysis

K Yelick, A Buluç, M Awan, A Azad… - … of the Royal …, 2020 - royalsocietypublishing.org
Genomic datasets are growing dramatically as the cost of sequencing continues to decline
and small sequencing devices become available. Enormous community databases store …

A survey of algorithms for transforming molecular dynamics data into metadata for in situ analytics based on machine learning methods

M Taufer, T Estrada, T Johnston - … Transactions of the …, 2020 - royalsocietypublishing.org
This paper presents the survey of three algorithms to transform atomic-level molecular
snapshots from molecular dynamics (MD) simulations into metadata representations that are …

Counting kmers for biological sequences at large scale

J Ge, J Meng, N Guo, Y Wei, P Balaji… - Interdisciplinary Sciences …, 2020 - Springer
Counting the abundance of all the distinct kmers in biological sequence data is a
fundamental step in bioinformatics. These applications include de novo genome assembly …

Distributed-memory k-mer counting on GPUs

I Nisa, P Pandey, M Ellis, L Oliker… - 2021 IEEE …, 2021 - ieeexplore.ieee.org
A fundamental step in many bioinformatics computations is to count the frequency of fixed-
length sequences, called k-mers, a problem that has received considerable attention as an …

Pakman: a scalable algorithm for generating genomic contigs on distributed memory machines

P Ghosh, S Krishnamoorthy… - IEEE Transactions on …, 2020 - ieeexplore.ieee.org
De novo genome assembly is a fundamental problem in the field of bioinformatics, that aims
to assemble the DNA sequence of an unknown genome from numerous short DNA …

A Distributed Alignment-free Pipeline for Human SNPs Genotyping

L Di Rocco, U Ferraro Petrillo - … of the 14th ACM International Conference …, 2023 - dl.acm.org
Identification of known genetic traits and disease-related variants within an individual
requires a fundamental task: genotyping a set of variants from a database. However, the …

KmerCo: A lightweight K-mer counting technique with a tiny memory footprint

S Nayak, R Patgiri - arXiv preprint arXiv:2305.07545, 2023 - arxiv.org
K-mer counting is a requisite process for DNA assembly because it speeds up its overall
process. The frequency of K-mers is used for estimating the parameters of DNA assembly …

On the power of combiner optimizations in mapreduce over MPI workflows

T Gao, Y Guo, B Zhang, P Cicotti, Y Lu… - 2018 IEEE 24th …, 2018 - ieeexplore.ieee.org
Analyzing large volumes of data is becoming more and more important in various scientific
computing domains. MapReduce over MPI frameworks are an appealing solution to enable …

Kcollections: A fast and efficient library for k-mers

MS Fujimoto, CA Lyman… - 2020 IEEE International …, 2020 - ieeexplore.ieee.org
K-mers form the backbone of many bioinformatic algorithms. They are, however, difficult to
store and use efficiently because the number of k-mers increases exponentially as k …