A survey of mapping algorithms in the long-reads era

K Sahlin, T Baudeau, B Cazaux, C Marchet - Genome Biology, 2023 - Springer
It has been over a decade since the first publication of a method dedicated entirely to
mapping long-reads. The distinctive characteristics of long reads resulted in methods …

Visualization of very large high-dimensional data sets as minimum spanning trees

D Probst, JL Reymond - Journal of Cheminformatics, 2020 - Springer
The chemical sciences are producing an unprecedented amount of large, high-dimensional
data sets containing chemical structures and associated properties. However, there are …

Minimizer-space de Bruijn graphs: Whole-genome assembly of long reads in minutes on a personal computer

B Ekim, B Berger, R Chikhi - Cell systems, 2021 - cell.com
DNA sequencing data continue to progress toward longer reads with increasingly lower
sequencing error rates. Here, we define an algorithmic approach, mdBG, that makes use of …

To petabytes and beyond: recent advances in probabilistic and signal processing algorithms and their application to metagenomics

RAL Elworth, Q Wang, PK Kota… - Nucleic acids …, 2020 - academic.oup.com
As computational biologists continue to be inundated by ever increasing amounts of
metagenomic data, the need for data analysis approaches that keep up with the pace of …

Genomic sketching with multiplicities and locality-sensitive hashing using Dashing 2

DN Baker, B Langmead - Genome Research, 2023 - genome.cshlp.org
A genomic sketch is a small, probabilistic representation of the set of k-mers in a sequencing
data set. Sketches are building blocks for large-scale analyses that consider similarities …

GSearch: ultra-fast and scalable genome search by combining K-mer hashing with hierarchical navigable small world graphs

J Zhao, JP Both, LM Rodriguez-R… - Nucleic Acids …, 2024 - academic.oup.com
Genome search and/or classification typically involves finding the best-match database
(reference) genomes and has become increasingly challenging due to the growing number …

EDAM: Edit distance tolerant approximate matching content addressable memory

R Hanhan, E Garzón, Z Jahshan, A Teman… - Proceedings of the 49th …, 2022 - dl.acm.org
We propose a novel edit distance-tolerant content addressable memory (EDAM) for energy-
efficient approximate search applications. Unlike state-of-the-art approximate search …

Dash-cam: dynamic approximate search content addressable memory for genome classification

Z Jahshan, I Merlin, E Garzón, L Yavits - … of the 56th Annual IEEE/ACM …, 2023 - dl.acm.org
We propose a novel dynamic storage-based approximate search content addressable
memory (DASH-CAM) for computational genomics applications, particularly for identification …

Drug-target interactions prediction based on network topology feature representation embedded deep forest

M Lian, X Wang, W Du - Neurocomputing, 2023 - Elsevier
Identifying drug-target interactions (DTIs) is instructive in drug design and disease treatment.
Existing studies typically used the properties of nodes (drug chemical structure and protein …

Hamming distance tolerant content-addressable memory (HD-CAM) for DNA classification

E Garzón, R Golman, Z Jahshan, R Hanhan… - IEEE …, 2022 - ieeexplore.ieee.org
This paper proposes a novel Hamming distance tolerant content-addressable memory (HD-
CAM) for energy-efficient in-memory approximate matching applications. HD-CAM exploits …