Deepjoin: Joinable table discovery with pre-trained language models

Y Dong, C Xiao, T Nozawa, M Enomoto… - arXiv preprint arXiv …, 2022 - arxiv.org
Due to the usefulness in data enrichment for data analysis tasks, joinable table discovery
has become an important operation in data lake management. Existing approaches target …

Retrieve, merge, predict: Augmenting tables with data lakes

R Cappuzzo, A Coelho, F Lefebvre, P Papotti… - arXiv preprint arXiv …, 2024 - arxiv.org
We present an in-depth analysis of data discovery in data lakes, focusing on table
augmentation for given machine learning tasks. We analyze alternative methods used in the …

Table Illustrator: Puzzle-based interactive authoring of plain tables

Y Huang, Y Yang, X Shu, R Chen, D Weng… - Proceedings of the CHI …, 2024 - dl.acm.org
Plain tables excel at displaying data details and are widely used in data presentation, often
polished to an elaborate appearance for readability in many scenarios. However, existing …

On the Use of Large Language Models for Table Tasks

Y Dong, M Oyamada, C Xiao, H Zhang - Proceedings of the 33rd ACM …, 2024 - dl.acm.org
The proliferation of large language models (LLMs) has catalyzed a diverse array of
applications. This tutorial delves into the application of LLMs for tabular data and targets a …

Calibration: A Simple Trick for Wide-table Delta Analytics

Z Huang, E Wu - arXiv preprint arXiv:2210.03851, 2022 - arxiv.org
Data analytics over normalized databases typically requires computing and materializing
expensive joins (wide-tables). Factorized query execution models execution as message …

Efficiently Estimating Mutual Information Between Attributes Across Tables

A Santos, F Korn, J Freire - arXiv preprint arXiv:2403.15553, 2024 - arxiv.org
Relational data augmentation is a powerful technique for enhancing data analytics and
improving machine learning models by incorporating columns from external datasets …

[HTML][HTML] Toward Dynamic Data-Driven Time-Slicing LSH for Joinable Table Discovery

W Wang, C Zhu, H Yan - Electronics, 2024 - mdpi.com
In legacy industrial systems, discovering joinable information between database tables is
important for applications such as data integration and data analysis. Locality-Sensitive …

Retrieval-based Text Selection for Addressing Class-Imbalanced Data in Classification

S Ahmadi, A Shah, E Fox - arXiv preprint arXiv:2307.14899, 2023 - arxiv.org
This paper addresses the problem of selecting of a set of texts for annotation in text
classification using retrieval methods when there are limits on the number of annotations …

GA-Tag: Data Enrichment with an Automatic Tagging System Utilizing Large Language Models

G Kusano - 2024 IEEE 40th International Conference on Data …, 2024 - ieeexplore.ieee.org
Data quality is widely recognized as being directly linked to the quality of analysis results. In
this study, we introduce a tagging method that simplifies the handling of extensive data and …

Efficient Algorithms for Correlated Data Discovery

ASR Santos - 2024 - search.proquest.com
The increase in our ability to collect and store data has led to an explosion in the number of
data repositories containing both public and enterprise data. While this abundance creates …