A reconstruction error-based framework for label noise detection

Z Salekshahrezaee, JL Leevy, TM Khoshgoftaar - Journal of Big Data, 2021 - Springer
Journal of Big Data, 2021Springer
Label noise is an important data quality issue that negatively impacts machine learning
algorithms. For example, label noise has been shown to increase the number of instances
required to train effective predictive models. It has also been shown to increase model
complexity and decrease model interpretability. In addition, label noise can cause the
classification results of a learner to be poor. In this paper, we detect label noise with three
unsupervised learners, namely principal component analysis (PCA) principal component …
Abstract
Label noise is an important data quality issue that negatively impacts machine learning algorithms. For example, label noise has been shown to increase the number of instances required to train effective predictive models. It has also been shown to increase model complexity and decrease model interpretability. In addition, label noise can cause the classification results of a learner to be poor. In this paper, we detect label noise with three unsupervised learners, namely $$\textit{principal component analysis} \hbox { (PCA)}$$ principal component analysis (PCA) , $$\textit{independent component analysis} \hbox { (ICA)}$$ independent component analysis (ICA) , and autoencoders. We evaluate these three learners on a credit card fraud dataset using multiple noise levels, and then compare results to the traditional Tomek links noise filter. Our binary classification approach, which considers label noise instances as anomalies, uniquely uses reconstruction errors for noisy data in order to identify and filter label noise. For detecting noisy instances, we discovered that the autoencoder algorithm was the top performer (highest recall score of 0.90), while Tomek links performed the worst (highest recall score of 0.62).
Springer
以上显示的是最相近的搜索结果。 查看全部搜索结果

Google学术搜索按钮

example.edu/paper.pdf
搜索
获取 PDF 文件
引用
References