Emerging approaches to DNA data storage: challenges and prospects

A Doricchi, CM Platnich, A Gimpel, F Horn, M Earle… - ACS …, 2022 - ACS Publications
With the total amount of worldwide data skyrocketing, the global data storage demand is
predicted to grow to 1.75× 1014 GB by 2025. Traditional storage methods have difficulties …

Survey and taxonomy of lossless graph compression and space-efficient graph representations

M Besta, T Hoefler - arXiv preprint arXiv:1806.01799, 2018 - arxiv.org
Various graphs such as web or social networks may contain up to trillions of edges.
Compressing such datasets can accelerate graph processing by reducing the amount of I/O …

Representation of k-Mer Sets Using Spectrum-Preserving String Sets

A Rahman, P Medevedev - Journal of Computational Biology, 2021 - liebertpub.com
Given the popularity and elegance of k-mer-based tools, finding a space-efficient way to
represent a set of k-mers is important for improving the scalability of bioinformatics analyses …

Genomic data compression

M Hernaez, D Pavlichin, T Weissman… - Annual Review of …, 2019 - annualreviews.org
Recently, there has been growing interest in genome sequencing, driven by advances in
sequencing technology, in terms of both efficiency and affordability. These developments …

Information theory in computational biology: where we stand today

P Chanda, E Costa, J Hu, S Sukumar, J Van Hemert… - Entropy, 2020 - mdpi.com
“A Mathematical Theory of Communication” was published in 1948 by Claude Shannon to
address the problems in the field of data compression and communication over (noisy) …

Efficient DNA sequence compression with neural networks

M Silva, D Pratas, AJ Pinho - GigaScience, 2020 - academic.oup.com
Background The increasing production of genomic data has led to an intensified need for
models that can cope efficiently with the lossless compression of DNA sequences. Important …

[HTML][HTML] Enhancing metagenomic classification with compression-based features

JM Silva, JR Almeida - Artificial Intelligence in Medicine, 2024 - Elsevier
Metagenomics is a rapidly expanding field that uses next-generation sequencing technology
to analyze the genetic makeup of environmental samples. However, accurately identifying …

Disk compression of k-mer sets

A Rahman, R Chikhi, P Medvedev - Algorithms for Molecular Biology, 2021 - Springer
K-mer based methods have become prevalent in many areas of bioinformatics. In
applications such as database search, they often work with large multi-terabyte-sized …

Sequence Compression Benchmark (SCB) database—A comprehensive evaluation of reference-free compressors for FASTA-formatted sequences

K Kryukov, MT Ueda, S Nakagawa, T Imanishi - GigaScience, 2020 - academic.oup.com
Background Nearly all molecular sequence databases currently use gzip for data
compression. Ongoing rapid accumulation of stored data calls for a more efficient …

Efficient and robust search of microbial genomes via phylogenetic compression

K Břinda, L Lima, S Pignotti, N Quinones-Olvera… - …, 2024 - pmc.ncbi.nlm.nih.gov
Comprehensive collections approaching millions of sequenced genomes have become
central information sources in the life sciences. However, the rapid growth of these …