As an important component of data exploration and integration, Column Type Annotation (CTA) aims to label columns of a table with one or more semantic types. With the recent …
C Li, D Zhang, J Wang - arXiv preprint arXiv:2408.16173, 2024 - arxiv.org
Detecting semantic types of columns in data lake tables is an important application. A key bottleneck in semantic type detection is the availability of human annotation due to the …
W Cho - arXiv preprint arXiv:2411.04443, 2024 - arxiv.org
The attention to table understanding using recent natural language models has been growing. However, most related works tend to focus on learning the structure of the table …
C Shen, J Wang - International Conference on Database Systems for …, 2024 - Springer
Long-form text matching plays a significant role in many real world Natural Language processing (NLP) and Information Retrieval (IR) applications. Recently Transformer based …
Data lakes are massive collections of structured and unstructured datasets. While these collections consist of various data formats, we focus on tabular data in data lakes. With the …