Information theory applications for biological sequence analysis

S Vinga - Briefings in bioinformatics, 2014 - academic.oup.com
Abstract Information theory (IT) addresses the analysis of communication systems and has
been widely applied in molecular biology. In particular, alignment-free sequence analysis …

Entropy and information approaches to genetic diversity and its expression: genomic geography

WB Sherwin - Entropy, 2010 - mdpi.com
This article highlights advantages of entropy-based genetic diversity measures, at levels
from gene expression to landscapes. Shannon's entropy-based diversity is the standard for …

Towards parameter-free data mining

E Keogh, S Lonardi, CA Ratanamahatana - Proceedings of the tenth …, 2004 - dl.acm.org
Most data mining algorithms require the setting of many input parameters. Two main
dangers of working with parameter-laden algorithms are the following. First, incorrect …

DNA-based cryptography

A Gehani, T LaBean, J Reif - … of molecular computing: essays dedicated to …, 2004 - Springer
Recent research has considered DNA as a medium for ultra-scale computation and for ultra-
compact information storage. One potential key application is DNA-based, molecular …

A new sequence distance measure for phylogenetic tree construction

HH Otu, K Sayood - Bioinformatics, 2003 - academic.oup.com
Motivation: Most existing approaches for phylogenetic inference use multiple alignment of
sequences and assume some sort of an evolutionary model. The multiple alignment strategy …

A simple statistical algorithm for biological sequence compression

MD Cao, TI Dix, L Allison… - 2007 Data Compression …, 2007 - ieeexplore.ieee.org
This paper introduces a novel algorithm for biological sequence compression that makes
use of both statistical properties and repetition within sequences. A panel of experts is …

Compression-based data mining of sequential data

E Keogh, S Lonardi, CA Ratanamahatana… - Data Mining and …, 2007 - Springer
The vast majority of data mining algorithms require the setting of many input parameters.
The dangers of working with parameter-laden algorithms are twofold. First, incorrect settings …

Estimating the entropy of binary time series: Methodology, some theory and a simulation study

Y Gao, I Kontoyiannis, E Bienenstock - Entropy, 2008 - mdpi.com
Partly motivated by entropy-estimation problems in neuroscience, we present a detailed and
extensive comparison between some of the most popular and effective entropy estimation …

On the complexity measures of genetic sequences

V D. Gusev, L A. Nemytikova, N A. Chuzhanova - Bioinformatics, 1999 - academic.oup.com
Motivation: It is well known that the regulatory regions of genomes are highly repetitive. They
are rich in direct, symmetric and complemented repeats, and there is no doubt about the …

GReEn: a tool for efficient compression of genome resequencing data

AJ Pinho, D Pratas, SP Garcia - Nucleic acids research, 2012 - academic.oup.com
Research in the genomic sciences is confronted with the volume of sequencing and
resequencing data increasing at a higher pace than that of data storage and communication …