Crowdsourcing database systems: Overview and challenges

C Chai, J Fan, G Li, J Wang… - 2019 IEEE 35th …, 2019 - ieeexplore.ieee.org
Many data management and analytics tasks, such as entity resolution, cannot be solely
addressed by automated processes. Crowdsourcing is an effective way to harness the …

Data management for machine learning: A survey

C Chai, J Wang, Y Luo, Z Niu… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
Machine learning (ML) has widespread applications and has revolutionized many
industries, but suffers from several challenges. First, sufficient high-quality training data is …

Selective data acquisition in the wild for model charging

C Chai, J Liu, N Tang, G Li, Y Luo - Proceedings of the VLDB …, 2022 - dl.acm.org
The lack of sufficient labeled data is a key bottleneck for practitioners in many real-world
supervised machine learning (ML) tasks. In this paper, we study a new problem, namely …

Domain adaptation for deep entity resolution

J Tu, J Fan, N Tang, P Wang, C Chai, G Li… - Proceedings of the …, 2022 - dl.acm.org
Entity resolution (ER) is a core problem of data integration. The state-of-the-art (SOTA)
results on ER are achieved by deep learning (DL) based methods, trained with a lot of …

Feature augmentation with reinforcement learning

J Liu, C Chai, Y Luo, Y Lou, J Feng… - 2022 IEEE 38th …, 2022 - ieeexplore.ieee.org
Sufficient good features are indispensable to train well-performed machine learning models.
However, it is com-mon that good features are not always enough, where feature …

Two-sided online micro-task assignment in spatial crowdsourcing

Y Tong, Y Zeng, B Ding, L Wang… - IEEE Transactions on …, 2019 - ieeexplore.ieee.org
With the rapid development of smartphones, spatial crowdsourcing platforms are getting
popular. A foundational research of spatial crowdsourcing is to allocate micro-tasks to …

Human-in-the-loop outlier detection

C Chai, L Cao, G Li, J Li, Y Luo, S Madden - Proceedings of the 2020 …, 2020 - dl.acm.org
Outlier detection is critical to a large number of applications from finance fraud detection to
health care. Although numerous approaches have been proposed to automatically detect …

Goodcore: Data-effective and data-efficient machine learning through coreset selection over incomplete data

C Chai, J Liu, N Tang, J Fan, D Miao, J Wang… - Proceedings of the …, 2023 - dl.acm.org
Given a dataset with incomplete data (eg, missing values), training a machine learning
model over the incomplete data requires two steps. First, it requires a data-effective step that …

Fluid: A blockchain based framework for crowdsourcing

S Han, Z Xu, Y Zeng, L Chen - … of the 2019 international conference on …, 2019 - dl.acm.org
Recently, crowdsourcing has emerged as a new computing paradigm to solve problems that
need human intrinsic, such as image annotation. However, there are two limitations in …

Interactively discovering and ranking desired tuples by data exploration

X Qin, C Chai, Y Luo, T Zhao, N Tang, G Li, J Feng… - The VLDB Journal, 2022 - Springer
Data exploration—the problem of extracting knowledge from database even if we do not
know exactly what we are looking for—is important for data discovery and analysis …