An overview of end-to-end entity resolution for big data

V Christophides, V Efthymiou, T Palpanas… - ACM Computing …, 2020 - dl.acm.org
One of the most critical tasks for improving data quality and increasing the reliability of data
analytics is Entity Resolution (ER), which aims to identify different descriptions that refer to …

A survey on blocking technology of entity resolution

BH Li, Y Liu, AM Zhang, WH Wang, S Wan - Journal of Computer Science …, 2020 - Springer
Entity resolution (ER) is a significant task in data integration, which aims to detect all entity
profiles that correspond to the same real-world entity. Due to its inherently quadratic …

Crowdsourcing database systems: Overview and challenges

C Chai, J Fan, G Li, J Wang… - 2019 IEEE 35th …, 2019 - ieeexplore.ieee.org
Many data management and analytics tasks, such as entity resolution, cannot be solely
addressed by automated processes. Crowdsourcing is an effective way to harness the …

Data management for machine learning: A survey

C Chai, J Wang, Y Luo, Z Niu… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
Machine learning (ML) has widespread applications and has revolutionized many
industries, but suffers from several challenges. First, sufficient high-quality training data is …

Selective data acquisition in the wild for model charging

C Chai, J Liu, N Tang, G Li, Y Luo - Proceedings of the VLDB …, 2022 - dl.acm.org
The lack of sufficient labeled data is a key bottleneck for practitioners in many real-world
supervised machine learning (ML) tasks. In this paper, we study a new problem, namely …

Domain adaptation for deep entity resolution

J Tu, J Fan, N Tang, P Wang, C Chai, G Li… - Proceedings of the …, 2022 - dl.acm.org
Entity resolution (ER) is a core problem of data integration. The state-of-the-art (SOTA)
results on ER are achieved by deep learning (DL) based methods, trained with a lot of …

Human-in-the-loop outlier detection

C Chai, L Cao, G Li, J Li, Y Luo, S Madden - Proceedings of the 2020 …, 2020 - dl.acm.org
Outlier detection is critical to a large number of applications from finance fraud detection to
health care. Although numerous approaches have been proposed to automatically detect …

Goodcore: Data-effective and data-efficient machine learning through coreset selection over incomplete data

C Chai, J Liu, N Tang, J Fan, D Miao, J Wang… - Proceedings of the …, 2023 - dl.acm.org
Given a dataset with incomplete data (eg, missing values), training a machine learning
model over the incomplete data requires two steps. First, it requires a data-effective step that …

End-to-end entity resolution for big data: A survey

V Christophides, V Efthymiou, T Palpanas… - arXiv preprint arXiv …, 2019 - arxiv.org
One of the most important tasks for improving data quality and the reliability of data analytics
results is Entity Resolution (ER). ER aims to identify different descriptions that refer to the …

[PDF][PDF] Human-in-the-loop Techniques in Machine Learning.

C Chai, G Li - IEEE Data Eng. Bull., 2020 - scholar.archive.org
Human-in-the-loop techniques are playing more and more significant roles in the machine
learning pipeline, which consists of data preprocessing, data labeling, model training and …