Efficient compression of SARS-CoV-2 genome data using Nucleotide Archival Format

K Kryukov, L Jin, S Nakagawa - Patterns, 2022 - cell.com
Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) genome data are essential
for epidemiology, vaccine development, and tracking emerging variants. Millions of SARS …

Efficient DNA sequence compression with neural networks

M Silva, D Pratas, AJ Pinho - GigaScience, 2020 - academic.oup.com
Background The increasing production of genomic data has led to an intensified need for
models that can cope efficiently with the lossless compression of DNA sequences. Important …

Sequence Compression Benchmark (SCB) database—A comprehensive evaluation of reference-free compressors for FASTA-formatted sequences

K Kryukov, MT Ueda, S Nakagawa, T Imanishi - GigaScience, 2020 - academic.oup.com
Background Nearly all molecular sequence databases currently use gzip for data
compression. Ongoing rapid accumulation of stored data calls for a more efficient …

Persistent minimal sequences of SARS-CoV-2

D Pratas, JM Silva - Bioinformatics, 2020 - academic.oup.com
Motivation Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has caused
more than 14 million cases and more than half million deaths. Given the absence of …

[PDF][PDF] Sousa

RCM PEREIRA - PA Fatores de mortalidade de micro e pequenas, 2018 - researchgate.net
The increasing availability of expressive quantities of human viral sequenced samples,
namely from clinical and forensic contexts, has led to the emergence of many optimized …

The complexity landscape of viral genomes

JM Silva, D Pratas, T Caetano, S Matos - GigaScience, 2022 - academic.oup.com
Background Viruses are among the shortest yet highly abundant species that harbor
minimal instructions to infect cells, adapt, multiply, and exist. However, with the current …

LEC-Codec: Learning-based genome data compression

Z Sun, M Wang, S Wang… - IEEE/ACM Transactions on …, 2024 - ieeexplore.ieee.org
In this paper, we propose a Learning-based gEnome Codec (LEC), which is designed for
high efficiency and enhanced flexibility. The LEC integrates several advanced technologies …

[HTML][HTML] GTO: a toolkit to unify pipelines in genomic and proteomic research

JR Almeida, AJ Pinho, JL Oliveira, O Fajarda, D Pratas - SoftwareX, 2020 - Elsevier
Next-generation sequencing triggered the production of a massive volume of publicly
available data and the development of new specialised tools. These tools are dispersed …

Comparative studies on the high-performance compression of SARS-CoV-2 genome collections

T Tang, J Li - Briefings in functional genomics, 2022 - academic.oup.com
The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is fast mutating
worldwide. The mutated strains have been timely sequenced by worldwide labs …

JARVIS3: an efficient encoder for genomic data

MJP Sousa, AJ Pinho, D Pratas - Bioinformatics, 2024 - academic.oup.com
Motivation Large-scale genomic projects grapple with the complex challenge of reducing
medium-and long-term storage space and its associated energy consumption, monetary …