Axiomatic foundations and algorithms for deciding semantic equivalences of SQL queries

S Chu, B Murphy, J Roesch, A Cheung… - arXiv preprint arXiv …, 2018 - arxiv.org
Deciding the equivalence of SQL queries is a fundamental problem in data management. As
prior work has mainly focused on studying the theoretical limitations of the problem, very few …

Automated verification of query equivalence using satisfiability modulo theories

Q Zhou, J Arulraj, S Navathe, W Harris… - Proceedings of the VLDB …, 2019 - dl.acm.org
Database-as-a-service offerings enable users to quickly create and deploy complex data
processing pipelines. In practice, these pipelines often exhibit significant overlap of …

Predicate pushdown for data science pipelines

C Yan, Y Lin, Y He - Proceedings of the ACM on Management of Data, 2023 - dl.acm.org
Predicate pushdown is a widely adopted query optimization. Existing systems and prior work
mostly use pattern-matching rules to decide when a predicate can be pushed through …

Optimizing recursive queries with progam synthesis

YR Wang, M Abo Khamis, HQ Ngo, R Pichler… - Proceedings of the …, 2022 - dl.acm.org
Most work on query optimization has concentrated on loop-free queries. However, data
science and machine learning workloads today typically involve recursive or iterative …

HAP: SPMD DNN Training on Heterogeneous GPU Clusters with Automated Program Synthesis

S Zhang, L Diao, C Wu, Z Cao, S Wang… - Proceedings of the …, 2024 - dl.acm.org
Single-Program-Multiple-Data (SPMD) parallelism has recently been adopted to train large
deep neural networks (DNNs). Few studies have explored its applicability on heterogeneous …

UDF to SQL translation through compositional lazy inductive synthesis

G Zhang, Y Xu, X Shen, I Dillig - … of the ACM on Programming Languages, 2021 - dl.acm.org
Many data processing systems allow SQL queries that call user-defined functions (UDFs)
written in conventional programming languages. While such SQL extensions provide …

Incorporating super-operators in big-data query optimizers

J Leeka, K Rajan - Proceedings of the VLDB Endowment, 2019 - dl.acm.org
The cost of big-data analytics is dominated by shuffle operations that induce multiple disk
reads, writes and network transfers. This paper proposes a new class of optimization rules …

Synthesizing replacement classes

M Samak, D Kim, MC Rinard - Proceedings of the ACM on Programming …, 2019 - dl.acm.org
We present a new technique for automatically synthesizing replacement classes. The
technique starts with an original class O and a potential replacement class R, then uses R to …

SlabCity: Whole-Query Optimization Using Program Synthesis

R Dong, J Liu, Y Zhu, C Yan, B Mozafari… - Proceedings of the VLDB …, 2023 - dl.acm.org
Query rewriting is often a prerequisite for effective query optimization, particularly for poorly-
written queries. Prior work on query rewriting has relied on a set of" rules" based on syntactic …

Niijima: Sound and automated computation consolidation for efficient multilingual data-parallel pipelines

GH Xu, M Veanes, M Barnett, M Musuvathi… - Proceedings of the 27th …, 2019 - dl.acm.org
Multilingual data-parallel pipelines, such as Microsoft's Scope and Apache Spark, are widely
used in real-world analytical tasks. While the involvement of multiple languages (often …