Genomic data compression

M Hernaez, D Pavlichin, T Weissman… - Annual Review of …, 2019 - annualreviews.org
Recently, there has been growing interest in genome sequencing, driven by advances in
sequencing technology, in terms of both efficiency and affordability. These developments …

SPRING: a next-generation compressor for FASTQ data

S Chandak, K Tatwawadi, I Ochoa, M Hernaez… - …, 2019 - academic.oup.com
Abstract Motivation High-Throughput Sequencing technologies produce huge amounts of
data in the form of short genomic reads, associated quality values and read identifiers …

FQSqueezer: k-mer-based compression of sequencing data

S Deorowicz - Scientific reports, 2020 - nature.com
The amount of data produced by modern sequencing instruments that needs to be stored is
huge. Therefore it is not surprising that a lot of work has been done in the field of specialized …

Productive visualization of high-throughput sequencing data using the SeqCode open portable platform

E Blanco, M González-Ramírez, L Di Croce - Scientific Reports, 2021 - nature.com
Large-scale sequencing techniques to chart genomes are entirely consolidated. Stable
computational methods to perform primary tasks such as quality control, read mapping, peak …

CoLoRd: compressing long reads

M Kokot, A Gudyś, H Li, S Deorowicz - Nature methods, 2022 - nature.com
The cost of maintaining exabytes of data produced by sequencing experiments every year
has become a major issue in today's genomic research. In spite of the increasing popularity …

Index suffix–prefix overlaps by (w, k)-minimizer to generate long contigs for reads compression

Y Liu, Z Yu, ME Dinger, J Li - Bioinformatics, 2019 - academic.oup.com
Motivation Advanced high-throughput sequencing technologies have produced massive
amount of reads data, and algorithms have been specially designed to contract the size of …

An introduction to mpeg-g: the first open iso/iec standard for the compression and exchange of genomic sequencing data

J Voges, M Hernaez, M Mattavelli… - Proceedings of the …, 2021 - ieeexplore.ieee.org
The development and progress of high-throughput sequencing technologies have
transformed the sequencing of DNA from a scientific research challenge to practice. With the …

LFastqC: A lossless non-reference-based FASTQ compressor

S Al Yami, CH Huang - PLoS One, 2019 - journals.plos.org
The cost-effectiveness of next-generation sequencing (NGS) has led to the advancement of
genomic research, thereby regularly generating a large amount of raw data that often …

PMFFRC: a large-scale genomic short reads compression optimizer via memory modeling and redundant clustering

H Sun, Y Zheng, H Xie, H Ma, X Liu, G Wang - BMC bioinformatics, 2023 - Springer
Background Genomic sequencing reads compressors are essential for balancing high-
throughput sequencing short reads generation speed, large-scale genomic data sharing …

Genodedup: Similarity-based deduplication and delta-encoding for genome sequencing data

V Cogo, J Paulo, A Bessani - IEEE Transactions on Computers, 2020 - ieeexplore.ieee.org
The vast datasets produced in human genomics must be efficiently stored, transferred, and
processed while prioritizing storage space and restore performance. Balancing these two …