R Cappuzzo, A Coelho, F Lefebvre, P Papotti… - arXiv preprint arXiv …, 2024 - arxiv.org
We present an in-depth analysis of data discovery in data lakes, focusing on table augmentation for given machine learning tasks. We analyze alternative methods used in the …
Plain tables excel at displaying data details and are widely used in data presentation, often polished to an elaborate appearance for readability in many scenarios. However, existing …
Y Dong, M Oyamada, C Xiao, H Zhang - Proceedings of the 33rd ACM …, 2024 - dl.acm.org
The proliferation of large language models (LLMs) has catalyzed a diverse array of applications. This tutorial delves into the application of LLMs for tabular data and targets a …
Z Huang, E Wu - arXiv preprint arXiv:2210.03851, 2022 - arxiv.org
Data analytics over normalized databases typically requires computing and materializing expensive joins (wide-tables). Factorized query execution models execution as message …
A Santos, F Korn, J Freire - arXiv preprint arXiv:2403.15553, 2024 - arxiv.org
Relational data augmentation is a powerful technique for enhancing data analytics and improving machine learning models by incorporating columns from external datasets …
W Wang, C Zhu, H Yan - Electronics, 2024 - mdpi.com
In legacy industrial systems, discovering joinable information between database tables is important for applications such as data integration and data analysis. Locality-Sensitive …
S Ahmadi, A Shah, E Fox - arXiv preprint arXiv:2307.14899, 2023 - arxiv.org
This paper addresses the problem of selecting of a set of texts for annotation in text classification using retrieval methods when there are limits on the number of annotations …
G Kusano - 2024 IEEE 40th International Conference on Data …, 2024 - ieeexplore.ieee.org
Data quality is widely recognized as being directly linked to the quality of analysis results. In this study, we introduce a tagging method that simplifies the handling of extensive data and …
The increase in our ability to collect and store data has led to an explosion in the number of data repositories containing both public and enterprise data. While this abundance creates …