Genomic data compression

M Hernaez, D Pavlichin, T Weissman… - Annual Review of …, 2019 - annualreviews.org
Recently, there has been growing interest in genome sequencing, driven by advances in
sequencing technology, in terms of both efficiency and affordability. These developments …

[HTML][HTML] The visual story of data storage: From storage properties to user interfaces

A Anžel, D Heider, G Hattab - Computational and Structural Biotechnology …, 2021 - Elsevier
About fifty times more data has been created than there are stars in the observable universe.
Current trends in data creation and consumption mean that the devices and storage media …

Lossless indexing with counting de Bruijn graphs

M Karasikov, H Mustafa, G Rätsch, A Kahles - Genome Research, 2022 - genome.cshlp.org
Sequencing data are rapidly accumulating in public repositories. Making this resource
accessible for interactive analysis at scale requires efficient approaches for its storage and …

Genozip: a universal extensible genomic data compressor

D Lan, R Tobler, Y Souilmi, B Llamas - Bioinformatics, 2021 - academic.oup.com
We present Genozip, a universal and fully featured compression software for genomic data.
Genozip is designed to be a general-purpose software and a development framework for …

A survey of BWT variants for string collections

D Cenzato, Z Lipták - arXiv preprint arXiv:2202.13235, 2022 - arxiv.org
In recent years, the focus of bioinformatics research has moved from individual sequences to
collections of sequences. Given the fundamental role of the Burrows-Wheeler Transform …

FQSqueezer: k-mer-based compression of sequencing data

S Deorowicz - Scientific reports, 2020 - nature.com
The amount of data produced by modern sequencing instruments that needs to be stored is
huge. Therefore it is not surprising that a lot of work has been done in the field of specialized …

CoLoRd: compressing long reads

M Kokot, A Gudyś, H Li, S Deorowicz - Nature methods, 2022 - nature.com
The cost of maintaining exabytes of data produced by sequencing experiments every year
has become a major issue in today's genomic research. In spite of the increasing popularity …

Sequence Compression Benchmark (SCB) database—A comprehensive evaluation of reference-free compressors for FASTA-formatted sequences

K Kryukov, MT Ueda, S Nakagawa, T Imanishi - GigaScience, 2020 - academic.oup.com
Background Nearly all molecular sequence databases currently use gzip for data
compression. Ongoing rapid accumulation of stored data calls for a more efficient …

Efficient sequencing data compression and FPGA acceleration based on a two-step framework

S Chen, Y Chen, Z Wang, W Qin, J Zhang… - Frontiers in …, 2023 - frontiersin.org
With the increasing throughput of modern sequencing instruments, the cost of storing and
transmitting sequencing data has also increased dramatically. Although many tools have …

An introduction to mpeg-g: the first open iso/iec standard for the compression and exchange of genomic sequencing data

J Voges, M Hernaez, M Mattavelli… - Proceedings of the …, 2021 - ieeexplore.ieee.org
The development and progress of high-throughput sequencing technologies have
transformed the sequencing of DNA from a scientific research challenge to practice. With the …