Computational pangenomics is an emerging research field that is changing the way computer scientists are facing challenges in biological sequence analysis. In past decades …
Motivation The variation graph toolkit (VG) represents genetic variation as a graph. Although each path in the graph is a potential haplotype, most paths are non-biological, unlikely …
Genomic epidemiology is a tool for tracing transmission of pathogens based on whole- genome sequencing. We introduce the mGEMS pipeline for genomic epidemiology with …
Availability of extensive genetic data across multiple individuals and populations is driving the growing importance of graph-based reference representations. Aligning sequences to …
Exact string matching in labeled graphs is the problem of searching paths of a graph G=(V, E) such that the concatenation of their node labels is equal to the given pattern string P [1 …
N Cotumaccio, N Prezza - Proceedings of the 2021 ACM-SIAM Symposium on …, 2021 - SIAM
An index for a finite automaton is a powerful data structure that supports locating paths labeled with a query pattern, thus solving pattern matching on the underlying regular …
JN Alanko, SJ Puglisi, J Vuohtoniemi - SIAM Conference on Applied and …, 2023 - SIAM
The κ-spectrum of a string is the set of all distinct substrings of length κ occurring in the string. This is a lossy but computationally convenient representation of the information in the …
The string matching problem on a node-labeled graph G=(V, E) asks whether a given pattern string P equals the concatenation of node labels of some path in G. This is a basic …
N Cotumaccio - 2022 Data Compression Conference (DCC), 2022 - ieeexplore.ieee.org
For the first time we provide a succinct pattern matching index for arbitrary graphs that can be built in polynomial time, while improving both space and query time bounds from SODA …