A deep dive into common open formats for analytical dbmss

C Liu, A Pavlenko, M Interlandi, B Haynes - Proceedings of the VLDB …, 2023 - dl.acm.org
This paper evaluates the suitability of Apache Arrow, Parquet, and ORC as formats for
subsumption in an analytical DBMS. We systematically identify and explore the high-level …

Decomposed bounded floats for fast compression and queries

C Liu, H Jiang, J Paparrizos, AJ Elmore - Proceedings of the VLDB …, 2021 - dl.acm.org
Modern data-intensive applications often generate large amounts of low precision float data
with a limited range of values. Despite the prevalence of such data, there is a lack of an …

An empirical evaluation of columnar storage formats

X Zeng, Y Hui, J Shen, A Pavlo, W McKinney… - arXiv preprint arXiv …, 2023 - arxiv.org
Columnar storage is a core component of a modern data analytics system. Although many
database management systems (DBMSs) have proprietary storage formats, most provide …

Designing succinct secondary indexing mechanism by exploiting column correlations

Y Wu, J Yu, Y Tian, R Sidle, R Barber - Proceedings of the 2019 …, 2019 - dl.acm.org
Database administrators construct secondary indexes on data tables to accelerate query
processing in relational database management systems (RDBMSs). These indexes are built …

Performance-optimal filtering: Bloom overtakes cuckoo at high throughput

H Lang, T Neumann, A Kemper, P Boncz - Proceedings of the VLDB …, 2019 - dl.acm.org
We define the concept of performance-optimal filtering to indicate the Bloom or Cuckoo filter
configuration that best accelerates a particular task. While the space-precision tradeoff of …

Pushing data-induced predicates through joins in big-data clusters

S Kandula, L Orr, S Chaudhuri - Proceedings of the VLDB Endowment, 2019 - dl.acm.org
Using data statistics, we convert predicates on a table into data induced predicates (diPs)
that apply on the joining tables. Doing so substantially speeds up multi-relation queries …

Sieve: A Learned Data-Skipping Index for Data Analytics

Y Tong, J Liu, H Wang, K Zhou, R He, Q Zhang… - Proceedings of the …, 2023 - dl.acm.org
Modern data analytics services are coupled with external data storage services, making I/O
from remote cloud storage one of the dominant costs for query processing. Techniques such …

[PDF][PDF] The periodic table of data structures

S Idreos, K Zoumpatianos, M Athanassoulis… - IEEE Data Eng …, 2018 - open.bu.edu
We describe the vision of being able to reason about the design space of data structures.
We break this down into two questions: 1) Can we know all data structures that is possible to …

Cuckoo index: A lightweight secondary index structure

A Kipf, D Chromejko, A Hall, P Boncz… - Proceedings of the …, 2020 - dl.acm.org
In modern data warehousing, data skipping is essential for high query performance. While
index structures such as B-trees or hash tables allow for precise pruning, their large storage …

RTScan: Efficient Scan with Ray Tracing Cores

Y Lv, K Zhang, Z Wang, X Zhang, R Lee, Z He… - Proceedings of the …, 2024 - dl.acm.org
Indexing is a core technique for accelerating predicate evaluation in databases. After many
years of effort, the indexing performance has reached its peak on the existing hardware …