Get another label? improving data quality and data mining using multiple, noisy labelers

VS Sheng, F Provost, PG Ipeirotis - Proceedings of the 14th ACM …, 2008 - dl.acm.org
This paper addresses the repeated acquisition of labels for data items when the labeling is
imperfect. We examine the improvement (or lack thereof) in data quality via repeated …

Active learning: A survey

CC Aggarwal, X Kong, Q Gu, J Han, SY Philip - Data classification, 2014 - taylorfrancis.com
In all these cases, labels can be obtained, but only at a significant cost to the end user. An
important observation is that all records are not equally important from the perspective of …

Efficiently learning the accuracy of labeling sources for selective sampling

P Donmez, JG Carbonell, J Schneider - Proceedings of the 15th ACM …, 2009 - dl.acm.org
Many scalable data mining tasks rely on active learning to provide the most useful
accurately labeled instances. However, what if there are multiple labeling sources ('oracles' …

Repeated labeling using multiple noisy labelers

PG Ipeirotis, F Provost, VS Sheng, J Wang - Data Mining and Knowledge …, 2014 - Springer
This paper addresses the repeated acquisition of labels for data items when the labeling is
imperfect. We examine the improvement (or lack thereof) in data quality via repeated …

Adaptive sampling strategies to construct equitable training datasets

W Cai, R Encarnacion, B Chern… - Proceedings of the …, 2022 - dl.acm.org
In domains ranging from computer vision to natural language processing, machine learning
models have been shown to exhibit stark disparities, often performing worse for members of …

Active feature-value acquisition

M Saar-Tsechansky, P Melville… - Management …, 2009 - pubsonline.informs.org
Most induction algorithms for building predictive models take as input training data in the
form of feature vectors. Acquiring the values of features may be costly, and simply acquiring …

Icebreaker: Element-wise efficient information acquisition with a bayesian deep latent gaussian model

W Gong, S Tschiatschek, S Nowozin… - Advances in neural …, 2019 - proceedings.neurips.cc
In this paper, we address the ice-start problem, ie, the challenge of deploying machine
learning models when only a little or no training data is initially available, and acquiring …

Class imbalance and active learning

J Attenberg, Ş Ertekin - Imbalanced Learning: Foundations …, 2013 - Wiley Online Library
This chapter focuses on the interaction between active learning (AL) and class imbalance,
discussing (i) AL techniques designed specifically for dealing with imbalanced settings,(ii) …

Active feature acquisition with supervised matrix completion

SJ Huang, M Xu, MK Xie, M Sugiyama, G Niu… - Proceedings of the 24th …, 2018 - dl.acm.org
Feature missing is a serious problem in many applications, which may lead to low quality of
training data and further significantly degrade the learning performance. While feature …

Learning to limit data collection via scaling laws: A computational interpretation for the legal principle of data minimization

D Shanmugam, F Diaz, S Shabanian, M Finck… - Proceedings of the …, 2022 - dl.acm.org
Modern machine learning systems are increasingly characterized by extensive personal
data collection, despite the diminishing returns and increasing societal costs of such …