Scientific workflows for computational reproducibility in the life sciences: Status, challenges and opportunities

S Cohen-Boulakia, K Belhajjame, O Collin… - Future Generation …, 2017 - Elsevier
With the development of new experimental technologies, biologists are faced with an
avalanche of data to be computationally analyzed for scientific advancements and …

[HTML][HTML] The role of metadata in reproducible computational research

J Leipzig, D Nüst, CT Hoyt, K Ram, J Greenberg - Patterns, 2021 - cell.com
Reproducible computational research (RCR) is the keystone of the scientific method for in
silico analyses, packaging the transformation of raw data to published results. In addition to …

A survey on provenance: What for? What form? What from?

M Herschel, R Diestelkämper, H Ben Lahmar - The VLDB Journal, 2017 - Springer
Provenance refers to any information describing the production process of an end product,
which can be anything from a piece of digital data to a physical object. While this survey …

Computing environments for reproducibility: Capturing the “Whole Tale”

A Brinckman, K Chard, N Gaffney, M Hategan… - Future Generation …, 2019 - Elsevier
The act of sharing scientific knowledge is rapidly evolving away from traditional articles and
presentations to the delivery of executable objects that integrate the data and computational …

Outlining traceability: A principle for operationalizing accountability in computing systems

JA Kroll - Proceedings of the 2021 ACM Conference on Fairness …, 2021 - dl.acm.org
Accountability is widely understood as a goal for well governed computer systems, and is a
sought-after value in many governance contexts. But how can it be achieved? Recent work …

FAIR computational workflows

C Goble, S Cohen-Boulakia, S Soiland-Reyes… - Data …, 2020 - direct.mit.edu
Computational workflows describe the complex multi-step methods that are used for data
collection, data preparation, analytics, predictive modelling, and simulation that lead to new …

Capturing and querying fine-grained provenance of preprocessing pipelines in data science

A Chapman, P Missier, G Simonelli… - Proceedings of the VLDB …, 2020 - dl.acm.org
Data processing pipelines that are designed to clean, transform and alter data in preparation
for learning predictive models, have an impact on those models' accuracy and performance …

Vamsa: Automated provenance tracking in data science scripts

MH Namaki, A Floratou, F Psallidas… - Proceedings of the 26th …, 2020 - dl.acm.org
There has recently been a lot of ongoing research in the areas of fairness, bias and
explainability of machine learning (ML) models due to the self-evident or regulatory …

Data distribution debugging in machine learning pipelines

S Grafberger, P Groth, J Stoyanovich, S Schelter - The VLDB Journal, 2022 - Springer
Abstract Machine learning (ML) is increasingly used to automate impactful decisions, and
the risks arising from this widespread use are garnering attention from policy makers …

[PDF][PDF] noWorkflow: a tool for collecting, analyzing, and managing provenance from python scripts

JF Pimentel, L Murta, V Braganholo… - Proceedings of the VLDB …, 2017 - par.nsf.gov
We present noWorkflow, an open-source tool that systematically and transparently collects
provenance from Python scripts, including data about the script execution and how the script …