Can foundation models wrangle your data?

A Narayan, I Chami, L Orr, S Arora, C Ré - arXiv preprint arXiv:2205.09911, 2022 - arxiv.org
Foundation Models (FMs) are models trained on large corpora of data that, at very large
scale, can generalize to new tasks without any task-specific finetuning. As these models …

[HTML][HTML] Construction of knowledge graphs: current state and challenges

M Hofer, D Obraczka, A Saeedi, H Köpcke, E Rahm - Information, 2024 - mdpi.com
With Knowledge Graphs (KGs) at the center of numerous applications such as recommender
systems and question-answering, the need for generalized pipelines to construct and …

Neo: A learned query optimizer

R Marcus, P Negi, H Mao, C Zhang, M Alizadeh… - arXiv preprint arXiv …, 2019 - arxiv.org
Query optimization is one of the most challenging problems in database systems. Despite
the progress made over the past decades, query optimizers remain extremely complex …

Big graphs: challenges and opportunities

W Fan - Proceedings of the VLDB Endowment, 2022 - dl.acm.org
Big data is typically characterized with 4V's: Volume, Velocity, Variety and Veracity. When it
comes to big graphs, these challenges become even more staggering. Each and every of …

Baran: Effective error correction via a unified context representation and transfer learning

M Mahdavi, Z Abedjan - Proceedings of the VLDB Endowment, 2020 - dl.acm.org
Traditional error correction solutions leverage handmaid rules or master data to find the
correct values. Both are often amiss in real-world scenarios. Therefore, it is desirable to …

Rotom: A meta-learned data augmentation framework for entity matching, data cleaning, text classification, and beyond

Z Miao, Y Li, X Wang - … of the 2021 International Conference on …, 2021 - dl.acm.org
Deep Learning revolutionizes almost all fields of computer science including data
management. However, the demand for high-quality training data is slowing down deep …

Auto-suggest: Learning-to-recommend data preparation steps using data science notebooks

C Yan, Y He - Proceedings of the 2020 ACM SIGMOD International …, 2020 - dl.acm.org
Data preparation is widely recognized as the most time-consuming process in modern
business intelligence (BI) and machine learning (ML) projects. Automating complex data …

Machine learning and data cleaning: Which serves the other?

IF Ilyas, T Rekatsinas - ACM Journal of Data and Information Quality …, 2022 - dl.acm.org
The last few years witnessed significant advances in building automated or semi-automated
data quality, data cleaning and data integration systems powered by machine learning (ML) …

VerifAI: verified generative AI

N Tang, C Yang, J Fan, L Cao, Y Luo… - arXiv preprint arXiv …, 2023 - arxiv.org
Generative AI has made significant strides, yet concerns about the accuracy and reliability of
its outputs continue to grow. Such inaccuracies can have serious consequences such as …

Cleanml: A study for evaluating the impact of data cleaning on ml classification tasks

P Li, X Rao, J Blase, Y Zhang, X Chu… - 2021 IEEE 37th …, 2021 - ieeexplore.ieee.org
Data quality affects machine learning (ML) model performances, and data scientists spend
considerable amount of time on data cleaning before model training. However, to date, there …