A deep dive into common open formats for analytical dbmss

C Liu, A Pavlenko, M Interlandi, B Haynes - Proceedings of the VLDB …, 2023 - dl.acm.org
This paper evaluates the suitability of Apache Arrow, Parquet, and ORC as formats for
subsumption in an analytical DBMS. We systematically identify and explore the high-level …

Self-tuning Database Systems: A Systematic Literature Review of Automatic Database Schema Design and Tuning

M Mozaffari, A Dignös, J Gamper, U Störl - ACM Computing Surveys, 2024 - dl.acm.org
Self-tuning is a feature of autonomic databases that includes the problem of automatic
schema design. It aims at providing an optimized schema that increases the overall …

Proteus: Autonomous adaptive storage for mixed workloads

M Abebe, H Lazu, K Daudjee - … of the 2022 International Conference on …, 2022 - dl.acm.org
Enterprises use distributed database systems to meet the demands of mixed or hybrid
transaction/analytical processing (HTAP) workloads that contain both transactional (OLTP) …

SAT: sampling acceleration tree for adaptive database repartition

X Xie, S Shi, H Wang, M Li - World Wide Web, 2023 - Springer
Nowadays, the volume of online data stored on websites is constantly increasing, and users'
demand for faster query response times is also on the rise with the expansion of network …

A survey on hybrid transactional and analytical processing

H Song, W Zhou, H Cui, X Peng, F Li - The VLDB Journal, 2024 - Springer
To provide applications with the ability to analyze fresh data and eliminate the time-
consuming ETL workflow, hybrid transactional and analytical (HTAP) systems have been …

Grouping time series for efficient columnar storage

C Fang, S Song, H Guan, X Huang, C Wang… - Proceedings of the ACM …, 2023 - dl.acm.org
Columnar storage is now an industry standard design in most open-source or commercial
time series database products, making them HTAP systems. The time column of a time …

Partition, Don't Sort! Compression Boosters for Cloud Data Ingestion Pipelines

P Hansert, S Michel - Proceedings of the VLDB Endowment, 2024 - dl.acm.org
Data Lakes deployed in the cloud are a go-to solution for enterprise data storage. While the
pay-as-you-go cost model allows flexible resource allocation and billing, it mandates an …

DrTM+ B: Replication-driven live reconfiguration for fast and general distributed transaction processing

S Shen, X Wei, R Chen, H Chen… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
Recent in-memory database systems leverage advanced hardware features like RDMA to
provide transaction processing at millions of transactions per second. Distributed transaction …

SH2O: Efficient Data Access for Work-Sharing Databases

P Sioulas, I Mytilinis, A Ailamaki - … of the ACM on Management of Data, 2023 - dl.acm.org
Interactive applications require processing tens to hundreds of concurrent analytical queries
within tight time constraints. In such setups, where high concurrency causes contention …

Ameliorating data compression and query performance through cracked Parquet

P Hansert, S Michel - Proceedings of The International Workshop on Big …, 2022 - dl.acm.org
In this paper, we propose to exploit synergy effects between partitioning and compression
for Dremel-encoded nested data serving as the data storage for Spark-style processing jobs …