Query processing on tensor computation runtimes

D He, S Nakandala, D Banda, R Sen, K Saur… - arXiv preprint arXiv …, 2022 - arxiv.org
The huge demand for computation in artificial intelligence (AI) is driving unparalleled
investments in hardware and software systems for AI. This leads to an explosion in the …

GPU Database Systems Characterization and Optimization

J Cao, R Sen, M Interlandi, J Arulraj, H Kim - Proceedings of the VLDB …, 2023 - dl.acm.org
GPUs offer massive parallelism and high-bandwidth memory access, making them an
attractive option for accelerating data analytics in database systems. However, while modern …

Auto-differentiation of relational computations for very large scale machine learning

Y Tang, Z Ding, D Jankov, B Yuan… - International …, 2023 - proceedings.mlr.press
The relational data model was designed to facilitate large-scale data management and
analytics. We consider the problem of how to differentiate computations expressed …

Joinboost: Grow trees over normalized data using only SQL

Z Huang, R Sen, J Liu, E Wu - arXiv preprint arXiv:2307.00422, 2023 - arxiv.org
Although dominant for tabular data, ML libraries that train tree models over normalized
databases (eg, LightGBM, XGBoost) require the data to be denormalized as a single table …

The tensor data platform: Towards an ai-centric database system

A Gandhi, Y Asada, V Fu, A Gemawat, L Zhang… - arXiv preprint arXiv …, 2022 - arxiv.org
Database engines have historically absorbed many of the innovations in data processing,
adding features to process graph data, XML, object oriented, and text among many others. In …

MaskSearch: Querying Image Masks at Scale

D He, J Zhang, M Daum, A Ratner… - arXiv preprint arXiv …, 2023 - arxiv.org
Machine learning tasks over image databases often generate masks that annotate image
content (eg, saliency maps, segmentation maps, depth maps) and enable a variety of …

Bullion: A Column Store for Machine Learning

G Liao, Y Liu, J Chen, DJ Abadi - arXiv preprint arXiv:2404.08901, 2024 - arxiv.org
The past two decades have witnessed columnar storage revolutionizing data warehousing
and analytics. However, the rapid growth of machine learning poses new challenges to this …

The Duck's Brain: Training and Inference of Neural Networks in Modern Database Engines

ME Schüle, T Neumann, A Kemper - arXiv preprint arXiv:2312.17355, 2023 - arxiv.org
Although database systems perform well in data access and manipulation, their relational
model hinders data scientists from formulating machine learning algorithms in SQL …

Random Forests over normalized data in CPU-GPU DBMSes

Z Huang, PK Damalapati, R Sen, E Wu - Proceedings of the 19th …, 2023 - dl.acm.org
This short paper studies query execution based on message passing on CPU-GPU systems,
using random forests training as the workload. We investigate different data placement and …

Teaching Blue Elephants the Maths for Machine Learning

C Ruck, ME Schüle - Proceedings of the Seventh Workshop on Data …, 2023 - dl.acm.org
Code-generation suits well for reverse mode automatic differentiation as it stores each
partial derivative as a virtual register. Since the introduction of just-in-time compilation in …