Management of machine learning lifecycle artifacts: A survey

M Schlegel, KU Sattler - ACM SIGMOD Record, 2023 - dl.acm.org
The explorative and iterative nature of developing and operating ML applications leads to a
variety of artifacts, such as datasets, features, models, hyperparameters, metrics, software …

Data management for machine learning: A survey

C Chai, J Wang, Y Luo, Z Niu… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
Machine learning (ML) has widespread applications and has revolutionized many
industries, but suffers from several challenges. First, sufficient high-quality training data is …

Building trust in earth science findings through data traceability and results explainability

P Olaya, D Kennedy, R Llamas, L Valera… - … on Parallel and …, 2022 - ieeexplore.ieee.org
To trust findings in computational science, scientists need workflows that trace the data
provenance and support results explainability. As workflows become more complex, tracing …

PROV-IO: A Cross-Platform Provenance Framework for Scientific Data on HPC Systems

R Han, M Zheng, S Byna, H Tang… - … on Parallel and …, 2024 - ieeexplore.ieee.org
Data provenance, or data lineage, describes the life cycle of data. In scientific workflows on
HPC systems, scientists often seek diverse provenance (eg, origins of data products, usage …

A review of machine learning in scanpath analysis for passive gaze-based interaction

A Mohamed Selim, M Barz, OS Bhatti… - Frontiers in Artificial …, 2024 - frontiersin.org
The scanpath is an important concept in eye tracking. It refers to a person's eye movements
over a period of time, commonly represented as a series of alternating fixations and …

Deep learning provenance data integration: a practical approach

D Pina, A Chapman, D De Oliveira… - … Proceedings of the ACM …, 2023 - dl.acm.org
A Deep Learning (DL) life cycle involves several data transformations, such as performing
data pre-processing, defining datasets to train and test a deep neural network (DNN), and …

Understanding Business Users' Data-Driven Decision-Making: Practices, Challenges, and Opportunities

S Gathani, Z Liu, PJ Haas, Ç Demiralp - arXiv preprint arXiv:2212.13643, 2022 - arxiv.org
Business users perform data analysis to inform decisions for improving business processes
and outcomes despite having limited formal technical training. While earlier work has …

MLflow2PROV: extracting provenance from machine learning experiments

M Schlegel, KU Sattler - Proceedings of the Seventh Workshop on Data …, 2023 - dl.acm.org
Supporting iterative and explorative workflows for developing machine learning (ML)
models, ML experiment management systems (ML EMSs), such as MLflow, are increasingly …

Augmented lineage: traceability of data analysis including complex UDF processing

M Yamada, H Kitagawa, T Amagasa, A Matono - The VLDB Journal, 2023 - Springer
Data lineage allows information to be traced to its origin in data analysis by showing how the
results were derived. Although many methods have been proposed to identify the source …

ProvLight: Efficient workflow provenance capture on the edge-to-cloud continuum

D Rosendo, M Mattoso, A Costan… - 2023 IEEE …, 2023 - ieeexplore.ieee.org
Modern scientific workflows require hybrid infrastructures combining numerous
decentralized resources on the IoT/Edge interconnected to Cloud/HPC systems (aka the …