Data and its (dis) contents: A survey of dataset development and use in machine learning research

A Paullada, ID Raji, EM Bender, E Denton, A Hanna - Patterns, 2021 - cell.com
In this work, we survey a breadth of literature that has revealed the limitations of
predominant practices for dataset collection and use in the field of machine learning. We …

Open science by design: Realizing a vision for 21st century research

National Academies of Sciences, Policy, Global Affairs… - 2018 - books.google.com
Openness and sharing of information are fundamental to the progress of science and to the
effective functioning of the research enterprise. The advent of scientific journals in the 17th …

Do datasets have politics? Disciplinary values in computer vision dataset development

MK Scheuerman, A Hanna, E Denton - … of the ACM on Human-Computer …, 2021 - dl.acm.org
Data is a crucial component of machine learning. The field is reliant on data to train, validate,
and test models. With increased technical capabilities, machine learning research has …

Towards accountability for machine learning datasets: Practices from software engineering and infrastructure

B Hutchinson, A Smart, A Hanna, E Denton… - Proceedings of the …, 2021 - dl.acm.org
Datasets that power machine learning are often used, shared, and reused with little visibility
into the processes of deliberation that led to their creation. As artificial intelligence systems …

Machine learning and artificial intelligence research for patient benefit: 20 critical questions on transparency, replicability, ethics, and effectiveness

S Vollmer, BA Mateen, G Bohner, FJ Király, R Ghani… - bmj, 2020 - bmj.com
Machine learning, artificial intelligence, and other modern statistical methods are providing
new opportunities to operationalise previously untapped and rapidly growing sources of …

[HTML][HTML] Dataset search: a survey

A Chapman, E Simperl, L Koesten, G Konstantinidis… - The VLDB Journal, 2020 - Springer
Generating value from data requires the ability to find, access and make sense of datasets.
There are many efforts underway to encourage data sharing and reuse, from scientific …

Ten simple rules for innovative dissemination of research

T Ross-Hellauer, JP Tennant, V Banelytė… - PLoS Computational …, 2020 - journals.plos.org
Author summary How we communicate research is changing because of new (especially
digital) possibilities. This article sets out 10 easy steps researchers can take to disseminate …

Towards reproducible computational drug discovery

N Schaduangrat, S Lampa, S Simeon… - Journal of …, 2020 - Springer
The reproducibility of experiments has been a long standing impediment for further scientific
progress. Computational methods have been instrumental in drug discovery efforts owing to …

Critique and contribute: A practice-based framework for improving critical data studies and data science

G Neff, A Tanweer, B Fiore-Gartland, L Osburn - Big data, 2017 - liebertpub.com
What would data science look like if its key critics were engaged to help improve it, and how
might critiques of data science improve with an approach that considers the day-to-day …

Alternative data and sentiment analysis: Prospecting non-standard data in machine learning-driven finance

KB Hansen, C Borch - Big Data & Society, 2022 - journals.sagepub.com
Social media commentary, satellite imagery and GPS data are a part of 'alternative data', that
is, data that originate outside of the standard repertoire of market data but are considered …