Modern dual in-line memory modules (DIMMs) support processing-in-memory (PIM) by implementing in-DIMM processors (IDPs) located near memory banks. PIM can greatly …
Data processing systems often leverage vector instructions to achieve higher performance. When applying vector instructions, an often overlooked data structure is the hash table, even …
Modern database management systems (DBMS), primarily designed as general-purpose systems, face the challenging task of efficiently handling data from diverse sources for both …
The simplicity of Python and its rich set of libraries has made it the most popular language for data science. Moreover, the interpreted nature of Python offers an easy debugging …
W Huang, Y Ji, X Zhou, B He, KL Tan - Proceedings of the VLDB …, 2023 - dl.acm.org
In this paper, we seek to perform a rigorous experimental study of main-memory hash joins in storage class memory (SCM). In particular, we perform a design space exploration in real …
Data science pipelines are typically exploratory. An integral task of such pipelines are feature transformations, which transform raw data into numerical matrices or tensors for …
The detection of constraint-based errors is a critical task in many data cleaning solutions. Previous works perform the task either using traditional data management systems or using …
A Kohn, V Leis, T Neumann - … of the 2021 International Conference on …, 2021 - dl.acm.org
Analytical queries virtually always involve aggregation and statistics. SQL offers a wide range of functionalities to summarize data such as associative aggregates, distinct …
P Fent, T Neumann - Proceedings of the VLDB Endowment, 2021 - dl.acm.org
Groupjoins, the combined execution of a join and a subsequent group by, are common in analytical queries, and occur in about 1/8 of the queries in TPC-H and TPC-DS. While they …