Parallel technology boosts data processing in recent years, and parallel direct data processing on hierarchically compressed documents exhibits great promise. The high …
F Zhang, W Wan, C Zhang, J Zhai, Y Chai… - Proceedings of the 2022 …, 2022 - dl.acm.org
In modern data management systems, directly performing operations on compressed data has been proven to be a big success facing big data problems. These systems have …
D Kempa, T Kociumaka - 2023 IEEE 64th Annual Symposium …, 2023 - ieeexplore.ieee.org
The last two decades have witnessed a dramatic increase in the amount of highly repetitive datasets consisting of sequential data (strings, texts). Processing these massive amounts of …
This article provides a comprehensive description of text analytics directly on compression (TADOC), which enables direct document analytics on compressed textual data. The article …
The information extraction framework of document spanners was introduced by Fagin, Kimelfeld, Reiss, and Vansummeren (PODS 2013, J. ACM 2015) as a formalisation of the …
Z Pan, F Zhang, Y Zhou, J Zhai, X Shen… - … on Parallel and …, 2021 - ieeexplore.ieee.org
With the development of computer architecture, even for embedded systems, GPU devices can be integrated, providing outstanding performance and energy efficiency to meet the …
D Kempa, B Saha - Proceedings of the 2022 Annual ACM-SIAM …, 2022 - SIAM
Lempel–Ziv (LZ77) compression is the most commonly used lossless compression algorithm. The basic idea is to greedily break the input string into blocks (called “phrases”) …
F Zhang, Z Pan, Y Zhou, J Zhai, X Shen… - 2021 IEEE 37th …, 2021 - ieeexplore.ieee.org
Text analytics directly on compression (TADOC) has proven to be a promising technology for big data analytics. GPUs are extremely popular accelerators for data analytics systems …
F Claude, G Navarro, A Pacheco - Journal of Computer and System …, 2021 - Elsevier
Abstract Let a text T [1.. n] be the only string generated by a context-free grammar with g (terminal and nonterminal) symbols, and of size G (measured as the sum of the lengths of …