作者
Chengliang Chai, Guoliang Li
发表日期
2020/6
期刊
IEEE Data Eng. Bull.
卷号
43
期号
3
页码范围
37-52
简介
Human-in-the-loop techniques are playing more and more significant roles in the machine learning pipeline, which consists of data preprocessing, data labeling, model training and inference. Humans can not only provide training data for machine learning applications, but also directly accomplish some tasks that are hard for the computer in the pipeline, with the help of machine-based approaches. In this paper, we first summarize the human-in-the-loop techniques in machine learning, including:(1) Data Extraction: Non-structured data always needs to be transformed to structured data for feature engineering, where humans can provide training data or generate rules for extraction.(2) Data Integration: In order to enrich data or features, data integration is proposed to join other tables. Humans can help to address some machine-hard join operations.(3) Data Cleaning: In real world, data is always dirty. We can leverage humans’ intelligence to clean the data and further induce rules to clean more.(4) Data Annotation and Iterative labeling. Machine learning always requires a large volume of high-quality training data, and humans can provide high quality data for training. When the budget is limited, iterative labeling is proposed to label the informative examples.(5) Model training and inference. For different applications (eg classification, clustering), given human labels, we have different ML techniques to train and infer the model. Then we summarize several commonly used techniques in human-in-the-loop machine learning applied in the above modules, including quality improvement, cost reduction, latency reduction, active learning and weak …
引用总数
学术搜索中的文章