Information theory applications for biological sequence analysis

S Vinga - Briefings in bioinformatics, 2014 - academic.oup.com
Abstract Information theory (IT) addresses the analysis of communication systems and has
been widely applied in molecular biology. In particular, alignment-free sequence analysis …

Data compression for sequencing data

S Deorowicz, S Grabowski - Algorithms for Molecular Biology, 2013 - Springer
Post-Sanger sequencing methods produce tons of data, and there is a generalagreement
that the challenge to store and process them must be addressedwith data compression. In …

Compression of FASTQ and SAM format sequencing data

JK Bonfield, MV Mahoney - PloS one, 2013 - journals.plos.org
Storage and transmission of the data produced by modern DNA sequencing instruments has
become a major concern, which prompted the Pistoia Alliance to pose the …

Storage and retrieval of highly repetitive sequence collections

V Mäkinen, G Navarro, J Sirén… - Journal of Computational …, 2010 - liebertpub.com
A repetitive sequence collection is a set of sequences which are small variations of each
other. A prominent example are genome sequences of individuals of the same or close …

Compression of DNA sequence reads in FASTQ format

S Deorowicz, S Grabowski - Bioinformatics, 2011 - academic.oup.com
Motivation: Modern sequencing instruments are able to generate at least hundreds of
millions short reads of genomic data. Those huge volumes of data require effective means to …

Large-scale compression of genomic sequence databases with the Burrows–Wheeler transform

AJ Cox, MJ Bauer, T Jakobi, G Rosone - Bioinformatics, 2012 - academic.oup.com
Abstract Motivation: The Burrows–Wheeler transform (BWT) is the foundation of many
algorithms for compression and indexing of text data, but the cost of computing the BWT of …

GReEn: a tool for efficient compression of genome resequencing data

AJ Pinho, D Pratas, SP Garcia - Nucleic acids research, 2012 - academic.oup.com
Research in the genomic sciences is confronted with the volume of sequencing and
resequencing data increasing at a higher pace than that of data storage and communication …

DNA sequence compression using adaptive particle swarm optimization-based memetic algorithm

Z Zhu, J Zhou, Z Ji, YH Shi - IEEE transactions on evolutionary …, 2011 - ieeexplore.ieee.org
With the rapid development of high-throughput DNA sequencing technologies, the amount
of DNA sequence data is accumulating exponentially. The huge influx of data creates new …

Discovering functional modules by identifying recurrent and mutually exclusive mutational patterns in tumors

CA Miller, SH Settle, EP Sulman, KD Aldape… - BMC medical …, 2011 - Springer
Background Assays of multiple tumor samples frequently reveal recurrent genomic
aberrations, including point mutations and copy-number alterations, that affect individual …

High-throughput DNA sequence data compression

Z Zhu, Y Zhang, Z Ji, S He, X Yang - Briefings in bioinformatics, 2015 - academic.oup.com
The exponential growth of high-throughput DNA sequence data has posed great challenges
to genomic data storage, retrieval and transmission. Compression is a critical tool to address …