Collecting more diverse and representative training data is often touted as a remedy for the disparate performance of machine learning predictors across subpopulations. However, a …
Machine learning models often perform poorly on subgroups that are underrepresented in the training data. Yet, little is understood on the variation in mechanisms that cause …
We study why overparameterization—increasing model size well beyond the point of zero training error—can hurt test error on minority groups despite improving average test error …
Training a fair machine learning model is essential to prevent demographic disparity. Existing techniques for improving model fairness require broad changes in either data …
We introduce dataset multiplicity, a way to study how inaccuracies, uncertainty, and social bias in training datasets impact test-time predictions. The dataset multiplicity framework asks …
K Lum, Y Zhang, A Bower - Proceedings of the 2022 ACM Conference …, 2022 - dl.acm.org
When a model's performance differs across socially or culturally relevant groups–like race, gender, or the intersections of many such groups–it is often called” biased.” While much of …
M Li, H Namkoong, S Xia - Advances in Neural Information …, 2021 - proceedings.neurips.cc
The performance of ML models degrades when the training population is different from that seen under operation. Towards assessing distributional robustness, we study the worst-case …
M Zhang, H Marklund, A Gupta… - arXiv preprint arXiv …, 2020 - marwandebbiche.github.io
A fundamental assumption of most machine learning algorithms is that the training and test data are drawn from the same underlying distribution. However, this assumption is violated …
Although machine learning models typically experience a drop in performance on out-of- distribution data, accuracies on in-versus out-of-distribution data are widely observed to …