Turl: Table understanding through representation learning

X Deng, H Sun, A Lees, Y Wu, C Yu - ACM SIGMOD Record, 2022 - dl.acm.org
Relational tables on the Web store a vast amount of knowledge. Owing to the wealth of such
tables, there has been tremendous progress on a variety of tasks in the area of table …

Bao: Making learned query optimization practical

R Marcus, P Negi, H Mao, N Tatbul… - Proceedings of the …, 2021 - dl.acm.org
Recent efforts applying machine learning techniques to query optimization have shown few
practical gains due to substantive training overhead, inability to adapt to changes, and poor …

Neo: A learned query optimizer

R Marcus, P Negi, H Mao, C Zhang, M Alizadeh… - arXiv preprint arXiv …, 2019 - arxiv.org
Query optimization is one of the most challenging problems in database systems. Despite
the progress made over the past decades, query optimizers remain extremely complex …

Creating embeddings of heterogeneous relational datasets for data integration tasks

R Cappuzzo, P Papotti… - Proceedings of the 2020 …, 2020 - dl.acm.org
Deep learning based techniques have been recently used with promising results for data
integration problems. Some methods directly use pre-trained embeddings that were trained …

Plan-structured deep neural network models for query performance prediction

R Marcus, O Papaemmanouil - arXiv preprint arXiv:1902.00132, 2019 - arxiv.org
Query performance prediction, the task of predicting the latency of a query, is one of the most
challenging problem in database management systems. Existing approaches rely on …

RPT: relational pre-trained transformer is almost all you need towards democratizing data preparation

N Tang, J Fan, F Li, J Tu, X Du, G Li, S Madden… - arXiv preprint arXiv …, 2020 - arxiv.org
Can AI help automate human-easy but computer-hard data preparation tasks that burden
data scientists, practitioners, and crowd workers? We answer this question by presenting …

[PDF][PDF] Data Curation with Deep Learning.

S Thirumuruganathan, N Tang, M Ouzzani, AH Doan - EDBT, 2020 - openproceedings.org
Data curation–the process of discovering, integrating, and cleaning data–is one of the
oldest, hardest, yet inevitable data management problems. Despite decades of efforts from …

Leva: Boosting machine learning performance with relational embedding data augmentation

Z Zhao, R Castro Fernandez - … of the 2022 International Conference on …, 2022 - dl.acm.org
In this paper, we present Leva, an end-to-end system that boosts the performance of
machine learning tasks over relational data. Leva builds a relational embedding by …

Knowledge transfer for entity resolution with siamese neural networks

M Loster, I Koumarelas, F Naumann - Journal of Data and Information …, 2021 - dl.acm.org
The integration of multiple data sources is a common problem in a large variety of
applications. Traditionally, handcrafted similarity measures are used to discover, merge, and …

Unsupervised matching of data and text

N Ahmadi, H Sand, P Papotti - 2022 IEEE 38th International …, 2022 - ieeexplore.ieee.org
Entity resolution is a widely studied problem with several proposals to match records across
relations. Matching textual content is a widespread task in many applications, such as …