Towards scalable dataframe systems

D Petersohn, S Macke, D Xin, W Ma, D Lee… - arXiv preprint arXiv …, 2020 - arxiv.org
Dataframes are a popular abstraction to represent, prepare, and analyze data. Despite the
remarkable success of dataframe libraries in Rand Python, dataframes face performance …

Accelerating machine learning inference with probabilistic predicates

Y Lu, A Chowdhery, S Kandula… - Proceedings of the 2018 …, 2018 - dl.acm.org
Classic query optimization techniques, including predicate pushdown, are of limited use for
machine learning inference queries, because the user-defined functions (UDFs) which …

Learning to sample: Counting with complex queries

B Walenz, S Sintos, S Roy, J Yang - arXiv preprint arXiv:1906.09335, 2019 - arxiv.org
We study the problem of efficiently estimating counts for queries involving complex filters,
such as user-defined functions, or predicates involving self-joins and correlated subqueries …

Sia: Optimizing queries using learned predicates

Q Zhou, J Arulraj, S Navathe, W Harris… - Proceedings of the 2021 …, 2021 - dl.acm.org
Predicate-centric rules for rewriting queries is a key technique in optimizing queries. These
include pushing down the predicate below the join and aggregation operators, or optimizing …

[图书][B] Dataframe systems: Theory, architecture, and implementation

D Petersohn - 2021 - search.proquest.com
Dataframes are a popular abstraction to represent, prepare, and analyze data. Despite the
remarkable success of dataframe libraries in R and Python, dataframes face performance …

AMNES: Accelerating the computation of data correlation using FPGAs

M Chiosa, TB Preußer, M Blott… - Proceedings of the …, 2023 - research-collection.ethz.ch
A widely used approach to characterize input data in both databases and ML is computing
the correlation between attributes. The operation is supported by all major database engines …

Perturbation analysis of database queries

B Walenz, J Yang - Proceedings of the VLDB Endowment, 2016 - dl.acm.org
We present a system, Perada, for parallel perturbation analysis of database queries.
Perturbation analysis considers the results of a query evaluated with (a typically large …

FPGA-Based Systems for Stream Data Analytics and I/O Data Transformations

M Chiosa - 2023 - research-collection.ethz.ch
The distributed nature of the cloud, regarding resource placement and application
execution, incurs large data transfer. As data movement is unavoidable, it lies within the …

Interactive demonstration of probabilistic predicates

Y Lu, S Kandula, S Chaudhuri - … of the 2018 International Conference on …, 2018 - dl.acm.org
We will demonstrate a prototype query processing engine that uses probabilistic predicates
(PPs) to speed up machine learning inference jobs. In current analytic engines, machine …

[图书][B] Supporting Progressive Query Processing and Scalable Data Enrichment for Real Time Analytic Applications

D Ghosh - 2021 - search.proquest.com
In this thesis, we propose EnrichDB, a new DBMS technology designed for emerging
domains (eg, social media analytics and sensor-driven smart spaces) that require incoming …