Indexing highly repetitive string collections, part II: Compressed indexes

G Navarro - ACM Computing Surveys (CSUR), 2021 - dl.acm.org
Two decades ago, a breakthrough in indexing string collections made it possible to
represent them within their compressed space while at the same time offering indexed …

Compressed full-text indexes

G Navarro, V Mäkinen - ACM Computing Surveys (CSUR), 2007 - dl.acm.org
Full-text indexes provide fast substring search over large text collections. A serious problem
of these indexes has traditionally been their space consumption. A recent trend is to develop …

Fast and accurate short read alignment with Burrows–Wheeler transform

H Li, R Durbin - bioinformatics, 2009 - academic.oup.com
Motivation: The enormous amount of short reads generated by the new DNA sequencing
technologies call for the development of fast and accurate read alignment programs. A first …

[图书][B] Modern information retrieval

R Baeza-Yates, B Ribeiro-Neto - 1999 - people.ischool.berkeley.edu
Information retrieval (IR) has changed considerably in recent years with the expansion of the
World Wide Web and the advent of modern and inexpensive graphical user interfaces and …

Fully functional suffix trees and optimal text searching in BWT-runs bounded space

T Gagie, G Navarro, N Prezza - Journal of the ACM (JACM), 2020 - dl.acm.org
Indexing highly repetitive texts—such as genomic databases, software repositories and
versioned text collections—has become an important problem since the turn of the …

Opportunistic data structures with applications

P Ferragina, G Manzini - Proceedings 41st annual symposium …, 2000 - ieeexplore.ieee.org
We address the issue of compressing and indexing data. We devise a data structure whose
space occupancy is a function of the entropy of the underlying data set. We call the data …

POCLib: A high-performance framework for enabling near orthogonal processing on compression

F Zhang, J Zhai, X Shen, O Mutlu… - IEEE transactions on …, 2021 - ieeexplore.ieee.org
Parallel technology boosts data processing in recent years, and parallel direct data
processing on hierarchically compressed documents exhibits great promise. The high …

Succinct indexable dictionaries with applications to encoding k-ary trees, prefix sums and multisets

R Raman, V Raman, SR Satti - ACM Transactions on Algorithms (TALG), 2007 - dl.acm.org
We consider the indexable dictionary problem, which consists of storing a set S⊆{0,…, m−
1} for some integer m while supporting the operations of rank (x), which returns the number …

Indexing compressed text

P Ferragina, G Manzini - Journal of the ACM (JACM), 2005 - dl.acm.org
We design two compressed data structures for the full-text indexing problem that support
efficient substring searches using roughly the space required for storing the text in …

[HTML][HTML] Replacing suffix trees with enhanced suffix arrays

MI Abouelhoda, S Kurtz, E Ohlebusch - Journal of discrete algorithms, 2004 - Elsevier
The suffix tree is one of the most important data structures in string processing and
comparative genomics. However, the space consumption of the suffix tree is a bottleneck in …