C Zhao, Y He - The World Wide Web Conference, 2019 - dl.acm.org
Entity matching (EM), also known as entity resolution, fuzzy join, and record linkage, refers to the process of identifying records corresponding to the same real-world entities from …
U Kruschwitz, C Hull - Foundations and Trends® in …, 2017 - nowpublishers.com
Search has become ubiquitous but that does not mean that search has been solved. Enterprise search, which is broadly speaking the use of information retrieval technology to …
C Yan, Y He - Proceedings of the 2018 International Conference on …, 2018 - dl.acm.org
Given a table of data, existing systems can often detect basic atomic types (eg, strings vs. numbers) for each column. A new generation of data-analytics and data-preparation …
C Chai, N Tang, J Fan, Y Luo - … of the 2023 International Conference on …, 2023 - dl.acm.org
Data preparation--the process of discovering, integrating, transforming, cleaning, and annotating data--is one of the oldest, hardest, yet inevitable data management problems …
Y Wang, Y He - Proceedings of the 2017 ACM International …, 2017 - dl.acm.org
Mapping relationships, such as (country, country-code) or (company, stock-ticker), are versatile data assets for an array of applications in data cleaning and data integration like …
In this report we describe the outcome of the First International Workshop on Professional Search, held in co-location with SIGIR 2018. The workshop addressed the specific …
K Li, Y He, K Ganjam - Proceedings of the 23rd ACM SIGKDD …, 2017 - dl.acm.org
Existing work on knowledge discovery focuses on using natural language techniques to extract entities and relationships from textual documents. However, today relational tables …
Y He, J Song, Y Wang, S Chaudhuri, V Anil… - arXiv preprint arXiv …, 2021 - arxiv.org
As data lakes become increasingly popular in large enterprises today, there is a growing need to tag or classify data assets (eg, files and databases) in data lakes with additional …
A Ionescu - Delft University of Technology]. http://resolver. tudelft …, 2020 - repository.tudelft.nl
We live in the digital era where content is produced every day [Dong and Srivastava, 2013] due to the rapid expansion of technologies and the high accessibility of data in multiple …