Efficient execution of user-defined functions in SQL queries

Y Foufoulas, A Simitsis - Proceedings of the VLDB Endowment, 2023 - dl.acm.org
User-defined functions (UDFs) have been widely used to overcome the expressivity
limitations of SQL and complement its declarative nature with functional capabilities. UDFs …

[PDF][PDF] The History, Present, and Future of ETL Technology

A Simitsis, S Skiadopoulos, P Vassiliadis - DOLAP, 2023 - cs.uoi.gr
There is an abundance of data, but a large volume of it is unusable. Data may be noisy,
unstructured, stored in incompatible for direct analysis medium or format, and often …

Udon: Efficient Debugging of User-Defined Functions in Big Data Systems with Line-by-Line Control

Y Huang, Z Wang, C Li - Proceedings of the ACM on Management of …, 2023 - dl.acm.org
Many big data systems are written in languages such as C, C++, Java, and Scala to process
large amounts of data efficiently, while data analysts often use Python to conduct data …

Investigating Learning Join Order Optimization Strategies for Rule-based Data Engines

A Karvelas, Y Foufoulas, A Simitsis… - Information Systems …, 2024 - Springer
A recent trend in data management research investigates whether machine learning
techniques could improve or replace core components of traditional database architectures …

Sharing Queries with Nonequivalent User-defined Aggregate Functions

C Zhang, T Farouk - ACM Transactions on Database Systems, 2024 - dl.acm.org
This article presents Sharing User-Defined Aggregate Function (SUDAF), a declarative
framework that allows users to write User-defined Aggregate Functions (UDAFs) as …

QFusor: A UDF Optimizer Plugin for SQL Databases

K Chasialis, T Palaiologou, Y Foufoulas… - 2024 IEEE 40th …, 2024 - ieeexplore.ieee.org
Modern data applications in areas such as text mining, document analysis, and data
science, involve complex algorithms and logic that cannot be expressed in SQL. Therefore …

Pasta: A Cost-Based Optimizer for Generating Pipelining Schedules for Dataflow DAGs

X Liu, Y Huang, X Lin, A Kumar, S Alsudais… - Proceedings of the ACM …, 2024 - dl.acm.org
Data analytics tasks are often formulated as data workflows represented as directed acyclic
graphs (DAGs) of operators. The recent trend of adopting machine learning (ML) techniques …

[PDF][PDF] Parallelized Path-finding in DuckPGQ

P Ren - 2024 - homepages.cwi.nl
Graphs, as an important model capable of describing real-world relationships, are getting
more and more attention from database researchers and developers. DuckPGQ is an …

[PDF][PDF] MIP: Advanced Data Processing and Analytics for Science and Medicine.

K Filippopolitis, Y Foufoulas, MN Garofalakis… - EDBT, 2024 - openproceedings.org
ABSTRACT We present the Medical Informatics Platform (MIP), an online collaborative
platform for the scientific and medical community. It federates de-centralized patient data …

Demonstration of Udon: Line-by-line Debugging of User-Defined Functions in Data Workflows

Y Huang, Z Wang, C Li - … of the 2024 International Conference on …, 2024 - dl.acm.org
Many big data systems are written in languages such as C, C++, Java, and Scala for high
efficiency, whereas data analysts often use Python to conduct data wrangling, statistical …