A survey on bias and fairness in machine learning

N Mehrabi, F Morstatter, N Saxena, K Lerman… - ACM computing …, 2021 - dl.acm.org
With the widespread use of artificial intelligence (AI) systems and applications in our
everyday lives, accounting for fairness has gained significant importance in designing and …

Data and its (dis) contents: A survey of dataset development and use in machine learning research

A Paullada, ID Raji, EM Bender, E Denton, A Hanna - Patterns, 2021 - cell.com
In this work, we survey a breadth of literature that has revealed the limitations of
predominant practices for dataset collection and use in the field of machine learning. We …

Overview of autextification at iberlef 2023: Detection and attribution of machine-generated text in multiple domains

AM Sarvazyan, JÁ González… - arXiv preprint arXiv …, 2023 - arxiv.org
This paper presents the overview of the AuTexTification shared task as part of the IberLEF
2023 Workshop in Iberian Languages Evaluation Forum, within the framework of the SEPLN …

Data governance in the age of large-scale data-driven language technology

Y Jernite, H Nguyen, S Biderman, A Rogers… - Proceedings of the …, 2022 - dl.acm.org
The recent emergence and adoption of Machine Learning technology, and specifically of
Large Language Models, has drawn attention to the need for systematic and transparent …

Mitigating dataset harms requires stewardship: Lessons from 1000 papers

K Peng, A Mathur, A Narayanan - arXiv preprint arXiv:2108.02922, 2021 - arxiv.org
Machine learning datasets have elicited concerns about privacy, bias, and unethical
applications, leading to the retraction of prominent datasets such as DukeMTMC, MS-Celeb …

Fairness in deep learning: A survey on vision and language research

O Parraga, MD More, CM Oliveira, NS Gavenski… - ACM Computing …, 2023 - dl.acm.org
Despite being responsible for state-of-the-art results in several computer vision and natural
language processing tasks, neural networks have faced harsh criticism due to some of their …

Behavioral use licensing for responsible ai

D Contractor, D McDuff, JK Haines, J Lee… - Proceedings of the …, 2022 - dl.acm.org
With the growing reliance on artificial intelligence (AI) for many different applications, the
sharing of code, data, and models is important to ensure the replicability and …

Healthsheet: development of a transparency artifact for health datasets

N Rostamzadeh, D Mincu, S Roy, A Smart… - Proceedings of the …, 2022 - dl.acm.org
Machine learning (ML) approaches have demonstrated promising results in a wide range of
healthcare applications. Data plays a crucial role in developing ML-based healthcare …

Rethinking software engineering in the era of foundation models: A curated catalogue of challenges in the development of trustworthy fmware

AE Hassan, D Lin, GK Rajbahadur, K Gallaba… - … Proceedings of the …, 2024 - dl.acm.org
Foundation models (FMs), such as Large Language Models (LLMs), have revolutionized
software development by enabling new use cases and business models. We refer to …

A framework for deprecating datasets: Standardizing documentation, identification, and communication

AS Luccioni, F Corry, H Sridharan, M Ananny… - Proceedings of the …, 2022 - dl.acm.org
Datasets are central to training machine learning (ML) models. The ML community has
recently made significant improvements to data stewardship and documentation practices …