Quality improvement methods are essential to gathering high-quality crowdsourced data, both for research and industry applications. A popular and broadly applicable method is task …
E Clark, T August, S Serrano, N Haduong… - arXiv preprint arXiv …, 2021 - arxiv.org
Human evaluations are typically considered the gold standard in natural language generation, but as models' fluency improves, how well can evaluators detect and judge …
A Celikyilmaz, E Clark, J Gao - arXiv preprint arXiv:2006.14799, 2020 - arxiv.org
The paper surveys evaluation methods of natural language generation (NLG) systems that have been developed in the last few years. We group NLG evaluation methods into three …
We systematically study a wide variety of generative models spanning semantically-diverse image datasets to understand and improve the feature extractors and metrics used to …
Human language is colored by a broad range of topics, but existing text analysis tools only focus on a small number of them. We present Empath, a tool that can generate and validate …
Crowdsourcing provides a scalable and efficient way to construct labeled datasets for training machine learning systems. However, creating comprehensive label guidelines for …
T Mitra, E Gilbert - Proceedings of the international AAAI conference on …, 2015 - ojs.aaai.org
Social media has quickly risen to prominence as a news source, yet lingering doubts remain about its ability to spread rumor and misinformation. Systematically studying this …
A Alkhatib, M Bernstein - Proceedings of the 2019 CHI Conference on …, 2019 - dl.acm.org
Errors and biases are earning algorithms increasingly malignant reputations in society. A central challenge is that algorithms must bridge the gap between high-level policy and on …
While datasets with single-label supervision have propelled rapid advances in image classification, additional annotations are necessary in order to quantitatively assess how …