Detecting and repairing dirty data is one of the perennial challenges in data analytics, and failure to do so can result in inaccurate analytics and unreliable decisions. Over the past few …
Foundation Models (FMs) are models trained on large corpora of data that, at very large scale, can generalize to new tasks without any task-specific finetuning. As these models …
Y Roh, G Heo, SE Whang - IEEE Transactions on Knowledge …, 2019 - ieeexplore.ieee.org
Data collection is a major bottleneck in machine learning and an active research topic in multiple communities. There are largely two reasons data collection has recently become a …
We present Ditto, a novel entity matching system based on pre-trained Transformer-based language models. We fine-tune and cast EM as a sequence-pair classification problem to …
Artificial Intelligence (AI) is making a profound impact in almost every domain. A vital enabler of its great success is the availability of abundant and high-quality data for building machine …
Entity matching (EM) finds data instances that refer to the same real-world entity. In this paper we examine applying deep learning (DL) to EM, to understand DL's benefits and …
This is an overview of the end-to-end data cleaning process. Data quality is one of the most important problems in data management, since dirty data often leads to inaccurate data …
Entity resolution (ER) is a key data integration problem. Despite the efforts in 70+ years in all aspects of ER, there is still a high demand for democratizing ER-humans are heavily …
Crowdsourcing has emerged as a novel problem-solving paradigm, which facilitates addressing problems that are hard for computers, eg, entity resolution and sentiment …