Extended-alphabet finite-context models

JM Carvalho, S Brás, D Pratas, J Ferreira… - Pattern Recognition …, 2018 - Elsevier
The normalized relative compression (NRC) is a recent dissimilarity measure, related to the
Kolmogorov complexity. It has been successfully used in different applications, like DNA …

[PDF][PDF] Computation of the Normalized Compression Distance of DNA Sequences using a Mixture of Finite-context Models.

D Pratas, AJ Pinho, SP Garcia - Bioinformatics, 2012 - academia.edu
A compression-based similarity measure assesses the similarity between two objects using
the number of bits needed to describe one of them when a description of the other is …

Compression-based normal similarity measures for DNA sequences

PJSG Ferreira, AJ Pinho - 2014 IEEE International Conference …, 2014 - ieeexplore.ieee.org
Similarity measures based on compression assess the distance between two objects based
on the number of bits needed to describe one, given a description of the other. Theoretically …

On the representability of complete genomes by multiple competing finite-context (Markov) models

AJ Pinho, PJSG Ferreira, AJR Neves, CAC Bastos - PloS one, 2011 - journals.plos.org
A finite-context (Markov) model of order yields the probability distribution of the next symbol
in a sequence of symbols, given the recent past up to depth. Markov modeling has long …

Comparison of compression-based measures with application to the evolution of primate genomes

D Pratas, RM Silva, AJ Pinho - Entropy, 2018 - mdpi.com
An efficient DNA compressor furnishes an approximation to measure and compare
information quantities present in, between and across DNA sequences, regardless of the …

A similarity measure using smallest context-free grammars

D Cerra, M Datcu - 2010 Data Compression Conference, 2010 - ieeexplore.ieee.org
This work presents a new approximation for the Kolmogorov complexity of strings based on
compression with smallest Context Free Grammars (CFG). If, for a given string, a dictionary …

The application of data compression-based distances to biological sequences

A Kertész-Farkas, A Kocsor, S Pongor - Information Theory and Statistical …, 2009 - Springer
Text compressor algorithms can be used to construct metric distance measures (CBDs)
suitable for character sequences. Here we review the principle of various types of …

A model conditioned data compression based similarity measure

D Cerra, M Datcu - Data Compression Conference (dcc 2008), 2008 - ieeexplore.ieee.org
Many methodologies and similarity measures based on data compression have been
recently introduced to compute similarities between general kinds of data. Two important …

Combination of n-grams and stochastic context-free grammars in an offline handwritten recognition system

V Romero, V Alabau, JM Benedí - Iberian Conference on Pattern …, 2007 - Springer
One area of pattern recognition that is receiving a lot of attention recently is handwritten text
recognition. Traditionally, handwritten text recognition systems have been modelled by …

DNA sequence modeling based on context trees

CJ Kusters, T Ignatenko - Proceedings of the 36th WIC Symposium …, 2015 - research.tue.nl
Genomic sequences contain instructions for protein and cell production. Therefore
understanding and identification of biologically and functionally meaningful patterns in DNA …