On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?🦜 EM Bender, T Gebru, A McMillan-Major, S Shmitchell Proceedings of the 2021 ACM Conference on Fairness, Accountability, and …, 2021 | 4479 | 2021 |
Bloom: A 176b-parameter open-access multilingual language model TL Scao, A Fan, C Akiki, E Pavlick, S Ilić, D Hesslow, R Castagné, ... arXiv preprint arXiv:2211.05100, 2022 | 1394 | 2022 |
Datasets: A Community Library for Natural Language Processing Q Lhoest, AV del Moral, Y Jernite, A Thakur, P von Platen, S Patil, ... arXiv preprint arXiv:2109.02846, 2021 | 222 | 2021 |
The bigscience roots corpus: A 1.6 tb composite multilingual dataset H Laurençon, L Saulnier, T Wang, C Akiki, AV del Moral, T Le Scao, ... Thirty-sixth Conference on Neural Information Processing Systems Datasets …, 2022 | 139 | 2022 |
The GEM Benchmark: Natural Language Generation, its Evaluation and Metrics S Gehrmann, T Adewumi, K Aggarwal, PS Ammanamanchi, ... arXiv preprint arXiv:2102.01672, 2021 | 133 | 2021 |
Reusable Templates and Guides For Documenting Datasets and Models for Natural Language Processing and Generation: A Case Study of the HuggingFace and GEM Data and Model Cards A McMillan-Major, S Osei, JD Rodriguez, PS Ammanamanchi, ... arXiv preprint arXiv:2108.07374, 2021 | 49 | 2021 |
Measuring Data M Mitchell, AS Luccioni, N Lambert, M Gerchick, A McMillan-Major, ... arXiv preprint arXiv:2212.05129, 2022 | 24 | 2022 |
Automating Gloss Generation in Interlinear Glossed Text A McMillan-Major Proceedings of the Society for Computation in Linguistics 3 (1), 338-349, 2020 | 24 | 2020 |
Data Statements: From Technical Concept to Community Practice A McMillan-Major, EM Bender, B Friedman ACM Journal on Responsible Computing, 2023 | 15 | 2023 |
GEMv2: Multilingual NLG Benchmarking in a Single Line of Code S Gehrmann, A Bhattacharjee, A Mahendiran, A Wang, A Papangelis, ... arXiv preprint arXiv:2206.11249, 2022 | 15 | 2022 |
Documenting geographically and contextually diverse data sources: The bigscience catalogue of language data and resources A McMillan-Major, Z Alyafeai, S Biderman, K Chen, F De Toni, G Dupont, ... arXiv preprint arXiv:2201.10066, 2022 | 14 | 2022 |
An Interactive Exploratory Tool for the Task of Hate Speech Detection A McMillan-Major, A Paullada, Y Jernite Proceedings of the Second Workshop on Bridging Human--Computer Interaction …, 2022 | 3 | 2022 |
Data Statements: Documenting the datasets used for training and testing natural language processing systems A McMillan-Major, EM Bender, B Friedman Presented at: Scholarly Communication in Linguistics: Resource Workshop and …, 2022 | 1 | 2022 |