Indexing highly repetitive string collections, part II: Compressed indexes

G Navarro - ACM Computing Surveys (CSUR), 2021 - dl.acm.org
Two decades ago, a breakthrough in indexing string collections made it possible to
represent them within their compressed space while at the same time offering indexed …

Opportunistic data structures with applications

P Ferragina, G Manzini - Proceedings 41st annual symposium …, 2000 - ieeexplore.ieee.org
We address the issue of compressing and indexing data. We devise a data structure whose
space occupancy is a function of the entropy of the underlying data set. We call the data …

Indexing compressed text

P Ferragina, G Manzini - Journal of the ACM (JACM), 2005 - dl.acm.org
We design two compressed data structures for the full-text indexing problem that support
efficient substring searches using roughly the space required for storing the text in …

Compressed suffix arrays and suffix trees with applications to text indexing and string matching

R Grossi, JS Vitter - Proceedings of the thirty-second annual ACM …, 2000 - dl.acm.org
The proliferation of online text, such as on the World Wide Web and in databases, motivates
the need for space-efficient index methods that support fast search. Consider a text T of n …

[图书][B] Flexible pattern matching in strings: practical on-line search algorithms for texts and biological sequences

G Navarro, M Raffinot - 2002 - books.google.com
Recent years have witnessed a dramatic increase of interest in sophisticated string matching
problems, especially in information retrieval and computational biology. This book presents …

A review on document image analysis techniques directly in the compressed domain

M Javed, P Nagabhushan, BB Chaudhuri - Artificial Intelligence Review, 2018 - Springer
The rapid growth of digital libraries, e-governance, and internet based applications has
caused an exponential escalation in the volume of 'Big-data'particularly due to texts, images …

Application of Lempel–Ziv factorization to the approximation of grammar-based compression

W Rytter - Theoretical Computer Science, 2003 - Elsevier
We introduce new type of context-free grammars, AVL-grammars, and show their
applicability to grammar-based compression. Using this type of grammars we present O (n …

Hamsa: Fast signature generation for zero-day polymorphic worms with provable attack resilience

Z Li, M Sanghi, Y Chen, MY Kao… - 2006 IEEE Symposium …, 2006 - ieeexplore.ieee.org
Zero-day polymorphic worms pose a serious threat to the security of Internet infrastructures.
Given their rapid propagation, it is crucial to detect them at edge networks and automatically …

New text indexing functionalities of the compressed suffix arrays

K Sadakane - Journal of Algorithms, 2003 - Elsevier
New text indexing functionalities of the compressed suffix arrays are proposed. The
compressed suffix array proposed by Grossi and Vitter is a space-efficient data structure for …

[PDF][PDF] Byte pair encoding: A text compression scheme that accelerates pattern matching

Y Shibata, T Kida, S Fukamachi, M Takeda… - 1999 - researchgate.net
Byte pair encoding (BPE) is a simple universal text compression scheme. Decompression is
very fast and requires small work space. Moreover, it is easy to decompress an arbitrary part …