Table understanding: Problem overview

A Shigarov - Wiley Interdisciplinary Reviews: Data Mining and …, 2023 - Wiley Online Library
Tables are probably the most natural way to represent relational data in various media and
formats. They store a large number of valuable facts that could be utilized for question …

Auto-em: End-to-end fuzzy entity-matching using pre-trained deep models and transfer learning

C Zhao, Y He - The World Wide Web Conference, 2019 - dl.acm.org
Entity matching (EM), also known as entity resolution, fuzzy join, and record linkage, refers to
the process of identifying records corresponding to the same real-world entities from …

Searching the enterprise

U Kruschwitz, C Hull - Foundations and Trends® in …, 2017 - nowpublishers.com
Search has become ubiquitous but that does not mean that search has been solved.
Enterprise search, which is broadly speaking the use of information retrieval technology to …

Synthesizing type-detection logic for rich semantic data types using open-source code

C Yan, Y He - Proceedings of the 2018 International Conference on …, 2018 - dl.acm.org
Given a table of data, existing systems can often detect basic atomic types (eg, strings vs.
numbers) for each column. A new generation of data-analytics and data-preparation …

Demystifying Artificial Intelligence for Data Preparation

C Chai, N Tang, J Fan, Y Luo - … of the 2023 International Conference on …, 2023 - dl.acm.org
Data preparation--the process of discovering, integrating, transforming, cleaning, and
annotating data--is one of the oldest, hardest, yet inevitable data management problems …

Synthesizing mapping relationships using table corpus

Y Wang, Y He - Proceedings of the 2017 ACM International …, 2017 - dl.acm.org
Mapping relationships, such as (country, country-code) or (company, stock-ticker), are
versatile data assets for an array of applications in data cleaning and data integration like …

First international workshop on professional search

S Verberne, J He, U Kruschwitz, G Wiggers… - ACM SIGIR Forum, 2019 - dl.acm.org
In this report we describe the outcome of the First International Workshop on Professional
Search, held in co-location with SIGIR 2018. The workshop addressed the specific …

Discovering enterprise concepts using spreadsheet tables

K Li, Y He, K Ganjam - Proceedings of the 23rd ACM SIGKDD …, 2017 - dl.acm.org
Existing work on knowledge discovery focuses on using natural language techniques to
extract entities and relationships from textual documents. However, today relational tables …

Auto-Tag: Tagging-Data-By-Example in Data Lakes

Y He, J Song, Y Wang, S Chaudhuri, V Anil… - arXiv preprint arXiv …, 2021 - arxiv.org
As data lakes become increasingly popular in large enterprises today, there is a growing
need to tag or classify data assets (eg, files and databases) in data lakes with additional …

[PDF][PDF] Reproducing state-of-the-art schema matching algorithms

A Ionescu - Delft University of Technology]. http://resolver. tudelft …, 2020 - repository.tudelft.nl
We live in the digital era where content is produced every day [Dong and Srivastava, 2013]
due to the rapid expansion of technologies and the high accessibility of data in multiple …