Pan-genomics in the human genome era

RM Sherman, SL Salzberg - Nature Reviews Genetics, 2020 - nature.com
Since the early days of the genome era, the scientific community has relied on a single
'reference'genome for each species, which is used as the basis for a wide range of genetic …

Data structures based on k-mers for querying large collections of sequencing data sets

C Marchet, C Boucher, SJ Puglisi, P Medvedev… - Genome …, 2021 - genome.cshlp.org
High-throughput sequencing data sets are usually deposited in public repositories (eg, the
European Nucleotide Archive) to ensure reproducibility. As the amount of data has reached …

Bifrost: highly parallel construction and indexing of colored and compacted de Bruijn graphs

G Holley, P Melsted - Genome biology, 2020 - Springer
Memory consumption of de Bruijn graphs is often prohibitive. Most de Bruijn graph-based
assemblers reduce the complexity by compacting paths into single vertices, but this is …

Ultrafast search of all deposited bacterial and viral genomic data

P Bradley, HC Den Bakker, EPC Rocha… - Nature …, 2019 - nature.com
Exponentially increasing amounts of unprocessed bacterial and viral genomic sequence
data are stored in the global archives. The ability to query these data for sequence search …

Metabolic framework of spontaneous and synthetic sourdough metacommunities to reveal microbial players responsible for resilience and performance

FM Calabrese, H Ameur, O Nikoloudaki, G Celano… - Microbiome, 2022 - Springer
Background In nature, microbial communities undergo changes in composition that threaten
their resiliency. Here, we interrogated sourdough, a natural cereal-fermenting …

Themisto: a scalable colored k-mer index for sensitive pseudoalignment against hundreds of thousands of bacterial genomes

JN Alanko, J Vuohtoniemi, T Mäklin, SJ Puglisi - Bioinformatics, 2023 - academic.oup.com
Motivation Huge datasets containing whole-genome sequences of bacterial strains are now
commonplace and represent a rich and important resource for modern genomic …

Mantis: a fast, small, and exact large-scale sequence-search index

P Pandey, F Almodaresi, MA Bender, M Ferdman… - Cell systems, 2018 - cell.com
Sequence-level searches on large collections of RNA sequencing experiments, such as the
NCBI Sequence Read Archive (SRA), would enable one to ask many questions about the …

Genome-wide somatic variant calling using localized colored de Bruijn graphs

G Narzisi, A Corvelo, K Arora, EA Bergmann… - Communications …, 2018 - nature.com
Reliable detection of somatic variations is of critical importance in cancer research. Here we
present Lancet, an accurate and sensitive somatic variant caller, which detects SNVs and …

REINDEER: efficient indexing of k-mer presence and abundance in sequencing datasets

C Marchet, Z Iqbal, D Gautheret, M Salson… - …, 2020 - academic.oup.com
Motivation In this work we present REINDEER, a novel computational method that performs
indexing of sequences and records their abundances across a collection of datasets. To the …

Current affairs of microbial genome-wide association studies: approaches, bottlenecks and analytical pitfalls

JE San, S Baichoo, A Kanzi, Y Moosa… - Frontiers in …, 2020 - frontiersin.org
Microbial genome-wide association studies (mGWAS) are a new and exciting research field
that is adapting human GWAS methods to understand how variations in microbial genomes …