Genomic data compression

M Hernaez, D Pavlichin, T Weissman… - Annual Review of …, 2019 - annualreviews.org
Recently, there has been growing interest in genome sequencing, driven by advances in
sequencing technology, in terms of both efficiency and affordability. These developments …

Deepzip: Lossless data compression using recurrent neural networks

M Goyal, K Tatwawadi, S Chandak, I Ochoa - arXiv preprint arXiv …, 2018 - arxiv.org
Sequential data is being generated at an unprecedented pace in various forms, including
text and genomic data. This creates the need for efficient compression mechanisms to …

Efficient compression of SARS-CoV-2 genome data using Nucleotide Archival Format

K Kryukov, L Jin, S Nakagawa - Patterns, 2022 - cell.com
Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) genome data are essential
for epidemiology, vaccine development, and tracking emerging variants. Millions of SARS …

A survey on data compression methods for biological sequences

M Hosseini, D Pratas, AJ Pinho - Information, 2016 - mdpi.com
The ever increasing growth of the production of high-throughput sequencing data poses a
serious challenge to the storage, processing and transmission of these data. As frequently …

DZip: Improved general-purpose loss less compression based on novel neural network modeling

M Goyal, K Tatwawadi, S Chandak… - 2021 data compression …, 2021 - ieeexplore.ieee.org
We consider lossless compression based on statistical data modeling followed by prediction-
based encoding, where an accurate statistical model for the input data leads to substantial …

Efficient DNA sequence compression with neural networks

M Silva, D Pratas, AJ Pinho - GigaScience, 2020 - academic.oup.com
Background The increasing production of genomic data has led to an intensified need for
models that can cope efficiently with the lossless compression of DNA sequences. Important …

FQSqueezer: k-mer-based compression of sequencing data

S Deorowicz - Scientific reports, 2020 - nature.com
The amount of data produced by modern sequencing instruments that needs to be stored is
huge. Therefore it is not surprising that a lot of work has been done in the field of specialized …

Sequence Compression Benchmark (SCB) database—A comprehensive evaluation of reference-free compressors for FASTA-formatted sequences

K Kryukov, MT Ueda, S Nakagawa, T Imanishi - GigaScience, 2020 - academic.oup.com
Background Nearly all molecular sequence databases currently use gzip for data
compression. Ongoing rapid accumulation of stored data calls for a more efficient …

[PDF][PDF] Sousa

RCM PEREIRA - PA Fatores de mortalidade de micro e pequenas, 2018 - researchgate.net
The increasing availability of expressive quantities of human viral sequenced samples,
namely from clinical and forensic contexts, has led to the emergence of many optimized …

The complexity landscape of viral genomes

JM Silva, D Pratas, T Caetano, S Matos - GigaScience, 2022 - academic.oup.com
Background Viruses are among the shortest yet highly abundant species that harbor
minimal instructions to infect cells, adapt, multiply, and exist. However, with the current …