Fast model debias with machine unlearning

R Chen, J Yang, H Xiong, J Bai, T Hu… - Advances in …, 2024 - proceedings.neurips.cc
Advances in Neural Information Processing Systems, 2024proceedings.neurips.cc
Recent discoveries have revealed that deep neural networks might behave in a biased
manner in many real-world scenarios. For instance, deep networks trained on a large-scale
face recognition dataset CelebA tend to predict blonde hair for females and black hair for
males. Such biases not only jeopardize the robustness of models but also perpetuate and
amplify social biases, which is especially concerning for automated decision-making
processes in healthcare, recruitment, etc., as they could exacerbate unfair economic and …
Abstract
Recent discoveries have revealed that deep neural networks might behave in a biased manner in many real-world scenarios. For instance, deep networks trained on a large-scale face recognition dataset CelebA tend to predict blonde hair for females and black hair for males. Such biases not only jeopardize the robustness of models but also perpetuate and amplify social biases, which is especially concerning for automated decision-making processes in healthcare, recruitment, etc., as they could exacerbate unfair economic and social inequalities among different groups. Existing debiasing methods suffer from high costs in bias labeling or model re-training, while also exhibiting a deficiency in terms of elucidating the origins of biases within the model. To this respect, we propose a fast model debiasing method (FMD) which offers an efficient approach to identify, evaluate and remove biases inherent in trained models. The FMD identifies biased attributes through an explicit counterfactual concept and quantifies the influence of data samples with influence functions. Moreover, we design a machine unlearning-based strategy to efficiently and effectively remove the bias in a trained model with a small counterfactual dataset. Experiments on the Colored MNIST, CelebA, and Adult Income datasets demonstrate that our method achieves superior or competing classification accuracies compared with state-of-the-art retraining-based methods while attaining significantly fewer biases and requiring much less debiasing cost. Notably, our method requires only a small external dataset and updating a minimal amount of model parameters, without the requirement of access to training data that may be too large or unavailable in practice.
proceedings.neurips.cc
以上显示的是最相近的搜索结果。 查看全部搜索结果