Indexing highly repetitive string collections, part II: Compressed indexes

G Navarro - ACM Computing Surveys (CSUR), 2021 - dl.acm.org
Two decades ago, a breakthrough in indexing string collections made it possible to
represent them within their compressed space while at the same time offering indexed …

POCLib: A high-performance framework for enabling near orthogonal processing on compression

F Zhang, J Zhai, X Shen, O Mutlu… - IEEE transactions on …, 2021 - ieeexplore.ieee.org
Parallel technology boosts data processing in recent years, and parallel direct data
processing on hierarchically compressed documents exhibits great promise. The high …

CompressDB: Enabling efficient compressed data direct processing for various databases

F Zhang, W Wan, C Zhang, J Zhai, Y Chai… - Proceedings of the 2022 …, 2022 - dl.acm.org
In modern data management systems, directly performing operations on compressed data
has been proven to be a big success facing big data problems. These systems have …

Collapsing the hierarchy of compressed data structures: Suffix arrays in optimal compressed space

D Kempa, T Kociumaka - 2023 IEEE 64th Annual Symposium …, 2023 - ieeexplore.ieee.org
The last two decades have witnessed a dramatic increase in the amount of highly repetitive
datasets consisting of sequential data (strings, texts). Processing these massive amounts of …

TADOC: Text analytics directly on compression

F Zhang, J Zhai, X Shen, D Wang, Z Chen, O Mutlu… - The VLDB Journal, 2021 - Springer
This article provides a comprehensive description of text analytics directly on compression
(TADOC), which enables direct document analytics on compressed textual data. The article …

Document spanners-a brief overview of concepts, results, and recent developments

ML Schmid, N Schweikardt - Proceedings of the 41st ACM SIGMOD …, 2022 - dl.acm.org
The information extraction framework of document spanners was introduced by Fagin,
Kimelfeld, Reiss, and Vansummeren (PODS 2013, J. ACM 2015) as a formalisation of the …

Exploring data analytics without decompression on embedded GPU systems

Z Pan, F Zhang, Y Zhou, J Zhai, X Shen… - … on Parallel and …, 2021 - ieeexplore.ieee.org
With the development of computer architecture, even for embedded systems, GPU devices
can be integrated, providing outstanding performance and energy efficiency to meet the …

An upper bound and linear-space queries on the LZ-End parsing

D Kempa, B Saha - Proceedings of the 2022 Annual ACM-SIAM …, 2022 - SIAM
Lempel–Ziv (LZ77) compression is the most commonly used lossless compression
algorithm. The basic idea is to greedily break the input string into blocks (called “phrases”) …

G-TADOC: Enabling efficient GPU-based text analytics without decompression

F Zhang, Z Pan, Y Zhou, J Zhai, X Shen… - 2021 IEEE 37th …, 2021 - ieeexplore.ieee.org
Text analytics directly on compression (TADOC) has proven to be a promising technology for
big data analytics. GPUs are extremely popular accelerators for data analytics systems …

Grammar-compressed indexes with logarithmic search time

F Claude, G Navarro, A Pacheco - Journal of Computer and System …, 2021 - Elsevier
Abstract Let a text T [1.. n] be the only string generated by a context-free grammar with g
(terminal and nonterminal) symbols, and of size G (measured as the sum of the lengths of …