Compressed full-text indexes

G Navarro, V Mäkinen - ACM Computing Surveys (CSUR), 2007 - dl.acm.org
Full-text indexes provide fast substring search over large text collections. A serious problem
of these indexes has traditionally been their space consumption. A recent trend is to develop …

A taxonomy of suffix array construction algorithms

SJ Puglisi, WF Smyth, AH Turpin - acm Computing Surveys (CSUR), 2007 - dl.acm.org
In 1990, Manber and Myers proposed suffix arrays as a space-saving alternative to suffix
trees and described the first algorithms for suffix array construction and use. Since that time …

Versatile and open software for comparing large genomes

S Kurtz, A Phillippy, AL Delcher, M Smoot, M Shumway… - Genome biology, 2004 - Springer
The newest version of MUMmer easily handles comparisons of large eukaryotic genomes at
varying evolutionary distances, as demonstrated by applications to multiple genomes. Two …

REPuter: the manifold applications of repeat analysis on a genomic scale

S Kurtz, JV Choudhuri, E Ohlebusch… - Nucleic acids …, 2001 - academic.oup.com
The repetitive structure of genomic DNA holds many secrets to be discovered. A systematic
study of repetitive DNA on a genomic or inter-genomic scale requires extensive algorithmic …

Fast algorithms for large-scale genome alignment and comparison

AL Delcher, A Phillippy, J Carlton… - Nucleic acids …, 2002 - academic.oup.com
We describe a suffix-tree algorithm that can align the entire genome sequences of
eukaryotic and prokaryotic organisms with minimal use of computer time and memory. The …

Alignment of whole genomes

AL Delcher, S Kasif, RD Fleischmann… - Nucleic acids …, 1999 - academic.oup.com
A new system for aligning whole genome sequences is described. Using an efficient data
structure called a suffix tree, the system is able to rapidly align sequences containing …

Using MUMmer to identify similar regions in large sequence sets

AL Delcher, SL Salzberg… - Current protocols in …, 2003 - Wiley Online Library
The MUMmer sequence alignment package is a suite of computer programs designed to
detect regions of homology in long biological sequences. Version 2.1 makes several …

Indexing compressed text

P Ferragina, G Manzini - Journal of the ACM (JACM), 2005 - dl.acm.org
We design two compressed data structures for the full-text indexing problem that support
efficient substring searches using roughly the space required for storing the text in …

[HTML][HTML] Replacing suffix trees with enhanced suffix arrays

MI Abouelhoda, S Kurtz, E Ohlebusch - Journal of discrete algorithms, 2004 - Elsevier
The suffix tree is one of the most important data structures in string processing and
comparative genomics. However, the space consumption of the suffix tree is a bottleneck in …

Compressed suffix arrays and suffix trees with applications to text indexing and string matching

R Grossi, JS Vitter - Proceedings of the thirty-second annual ACM …, 2000 - dl.acm.org
The proliferation of online text, such as on the World Wide Web and in databases, motivates
the need for space-efficient index methods that support fast search. Consider a text T of n …