Quality improvement methods are essential to gathering high-quality crowdsourced data, both for research and industry applications. A popular and broadly applicable method is task …
AI models are increasingly applied in high-stakes domains like health and conservation. Data quality carries an elevated significance in high-stakes AI due to its heightened …
E Clark, T August, S Serrano, N Haduong… - arXiv preprint arXiv …, 2021 - arxiv.org
Human evaluations are typically considered the gold standard in natural language generation, but as models' fluency improves, how well can evaluators detect and judge …
Abstract Many tasks in Natural Language Processing (NLP) and Computer Vision (CV) offer evidence that humans disagree, from objective tasks such as part-of-speech tagging to more …
Y Roh, G Heo, SE Whang - IEEE Transactions on Knowledge …, 2019 - ieeexplore.ieee.org
Data collection is a major bottleneck in machine learning and an active research topic in multiple communities. There are largely two reasons data collection has recently become a …
Computer vision models have known performance disparities across attributes such as gender and skin tone. This means during tasks such as classification and detection, model …
Diversity in datasets is a key component to building responsible AI/ML. Despite this recognition, we know little about the diversity among the annotators involved in data …
JD Harris, B Waggoner - 2019 IEEE international conference on …, 2019 - ieeexplore.ieee.org
Machine learning has recently enabled large advances in artificial intelligence, but these tend to be highly centralized. The large datasets required are generally proprietary; …
B Shmueli, J Fell, S Ray, LW Ku - arXiv preprint arXiv:2104.10097, 2021 - arxiv.org
The use of crowdworkers in NLP research is growing rapidly, in tandem with the exponential increase in research production in machine learning and AI. Ethical discussion regarding …