Racial disparity in natural language processing: A case study of social media african-american...

RS Baker, A Hawn - International Journal of Artificial Intelligence in …, 2022 - Springer

In this paper, we review algorithmic bias in education, discussing the causes of that bias and
reviewing the empirical literature on the specific ways that algorithmic bias is known to have …

被引用次数：381 相关文章所有 4 个版本

[PDF] acm.org

Fairness in machine learning: A survey

S Caton, C Haas - ACM Computing Surveys, 2024 - dl.acm.org

When Machine Learning technologies are used in contexts that affect citizens, companies as
well as researchers need to be confident that there will not be any unexpected social …

被引用次数：609 相关文章所有 5 个版本

[PDF] arxiv.org

Holistic evaluation of language models

P Liang, R Bommasani, T Lee, D Tsipras… - arXiv preprint arXiv …, 2022 - arxiv.org

Language models (LMs) are becoming the foundation for almost all major language
technologies, but their capabilities, limitations, and risks are not well understood. We present …

被引用次数：880 相关文章所有 5 个版本

[PDF] neurips.cc

Doremi: Optimizing data mixtures speeds up language model pretraining

SM Xie, H Pham, X Dong, N Du, H Liu… - Advances in …, 2024 - proceedings.neurips.cc

The mixture proportions of pretraining data domains (eg, Wikipedia, books, web text) greatly
affect language model (LM) performance. In this paper, we propose Domain Reweighting …

被引用次数：80 相关文章所有 6 个版本

[PDF] arxiv.org

On the opportunities and risks of foundation models

R Bommasani, DA Hudson, E Adeli, R Altman… - arXiv preprint arXiv …, 2021 - arxiv.org

AI is undergoing a paradigm shift with the rise of models (eg, BERT, DALL-E, GPT-3) that are
trained on broad data at scale and are adaptable to a wide range of downstream tasks. We …

被引用次数：3593 相关文章所有 2 个版本

[PDF] neurips.cc

Data selection for language models via importance resampling

SM Xie, S Santurkar, T Ma… - Advances in Neural …, 2023 - proceedings.neurips.cc

Selecting a suitable pretraining dataset is crucial for both general-domain (eg, GPT-3) and
domain-specific (eg, Codex) language models (LMs). We formalize this problem as selecting …

被引用次数：82 相关文章所有 5 个版本

[PDF] arxiv.org

Language (technology) is power: A critical survey of" bias" in nlp

SL Blodgett, S Barocas, H Daumé III… - arXiv preprint arXiv …, 2020 - arxiv.org

We survey 146 papers analyzing" bias" in NLP systems, finding that their motivations are
often vague, inconsistent, and lacking in normative reasoning, despite the fact that …

被引用次数：1153 相关文章所有 5 个版本

[PDF] pnas.org Full View

Racial disparities in automated speech recognition

A Koenecke, A Nam, E Lake, J Nudell… - Proceedings of the …, 2020 - National Acad Sciences

Automated speech recognition (ASR) systems, which use sophisticated machine-learning
algorithms to convert spoken language to text, have become increasingly widespread …

被引用次数：667 相关文章所有 17 个版本

[PDF] mlr.press

Wilds: A benchmark of in-the-wild distribution shifts

PW Koh, S Sagawa, H Marklund… - International …, 2021 - proceedings.mlr.press

Distribution shifts—where the training distribution differs from the test distribution—can
substantially degrade the accuracy of machine learning (ML) systems deployed in the wild …

被引用次数：1325 相关文章所有 13 个版本

[PDF] arxiv.org

Typology of risks of generative text-to-image models

C Bird, E Ungless, A Kasirzadeh - Proceedings of the 2023 AAAI/ACM …, 2023 - dl.acm.org

This paper investigates the direct risks and harms associated with modern text-to-image
generative models, such as DALL-E and Midjourney, through a comprehensive literature …

被引用次数：64 相关文章所有 5 个版本

高级搜索

QQ 群