Compressed full-text indexes

G Navarro, V Mäkinen - ACM Computing Surveys (CSUR), 2007 - dl.acm.org
Full-text indexes provide fast substring search over large text collections. A serious problem
of these indexes has traditionally been their space consumption. A recent trend is to develop …

Graph-based text representation and matching: A review of the state of the art and future challenges

AH Osman, OM Barukub - IEEE Access, 2020 - ieeexplore.ieee.org
Graph-based text representation is one of the important preprocessing steps in data and text
mining, Natural Language Processing (NLP), and information retrieval approaches. The …

[图书][B] Flexible pattern matching in strings: practical on-line search algorithms for texts and biological sequences

G Navarro, M Raffinot - 2002 - books.google.com
Recent years have witnessed a dramatic increase of interest in sophisticated string matching
problems, especially in information retrieval and computational biology. This book presents …

At the roots of dictionary compression: string attractors

D Kempa, N Prezza - Proceedings of the 50th Annual ACM SIGACT …, 2018 - dl.acm.org
A well-known fact in the field of lossless text compression is that high-order entropy is a
weak model when the input contains long repetitions. Motivated by this fact, decades of …

[图书][B] Genome-scale algorithm design

V Mäkinen, D Belazzougui, F Cunial, AI Tomescu - 2015 - books.google.com
High-throughput sequencing has revolutionised the field of biological sequence analysis. Its
application has enabled researchers to address important biological questions, often for the …

Reducing the space requirement of suffix trees

S Kurtz - Software: Practice and Experience, 1999 - Wiley Online Library
We show that suffix trees store various kinds of redundant information. We exploit these
redundancies to obtain more space efficient representations. The most space efficient of our …

Factor oracle: A new structure for pattern matching

C Allauzen, M Crochemore, M Raffinot - … on Current Trends in Theory and …, 1999 - Springer
We introduce a new automaton on a word p, sequence of letters taken in an alphabet Σ, that
we call factor oracle. This automaton is acyclic, recognizes at least the factors of p, has m+ 1 …

New text indexing functionalities of the compressed suffix arrays

K Sadakane - Journal of Algorithms, 2003 - Elsevier
New text indexing functionalities of the compressed suffix arrays are proposed. The
compressed suffix array proposed by Grossi and Vitter is a space-efficient data structure for …

[图书][B] Graph-theoretic techniques for web content mining

A Schenker, H Bunke, M Last, A Kandel - 2005 - books.google.com
This book describes exciting new opportunities for utilizing robust graph representations of
data with common machine learning algorithms. Graphs can model additional information …

Method and system for data compression in a relational database

CK Apanowicz, VK Eastwood, DA Slezak… - US Patent …, 2014 - Google Patents
A method for applying adaptive data compression in a relational database system using a
filter cascade having at least one compression filter stage in the filter cascade. The method …