Asset Management in Machine Learning: State-of-research and State-of-practice

S Idowu, D Strüber, T Berger - ACM Computing Surveys, 2022 - dl.acm.org
Machine learning components are essential for today's software systems, causing a need to
adapt traditional software engineering practices when developing machine-learning-based …

Large language model supply chain: A research agenda

S Wang, Y Zhao, X Hou, H Wang - ACM Transactions on Software …, 2024 - dl.acm.org
The rapid advancement of large language models (LLMs) has revolutionized artificial
intelligence, introducing unprecedented capabilities in natural language processing and …

Operationalizing machine learning: An interview study

S Shankar, R Garcia, JM Hellerstein… - arXiv preprint arXiv …, 2022 - arxiv.org
Organizations rely on machine learning engineers (MLEs) to operationalize ML, ie, deploy
and maintain ML pipelines in production. The process of operationalizing ML, or MLOps …

Management of machine learning lifecycle artifacts: A survey

M Schlegel, KU Sattler - ACM SIGMOD Record, 2023 - dl.acm.org
The explorative and iterative nature of developing and operating ML applications leads to a
variety of artifacts, such as datasets, features, models, hyperparameters, metrics, software …

Capturing and querying fine-grained provenance of preprocessing pipelines in data science

A Chapman, P Missier, G Simonelli… - Proceedings of the VLDB …, 2020 - dl.acm.org
Data processing pipelines that are designed to clean, transform and alter data in preparation
for learning predictive models, have an impact on those models' accuracy and performance …

Data science through the looking glass: Analysis of millions of github notebooks and ml. net pipelines

F Psallidas, Y Zhu, B Karlas, J Henkel… - ACM SIGMOD …, 2022 - dl.acm.org
The recent success of machine learning (ML) has led to an explosive growth of systems and
applications built by an ever-growing community of system builders and data science (DS) …

Lightweight inspection of data preprocessing in native machine learning pipelines

S Grafberger, J Stoyanovich, S Schelter - Conference on Innovative Data …, 2021 - par.nsf.gov
Machine Learning (ML) is increasingly used to automate impactful decisions, and the risks
arising from this wide-spread use are garnering attention from policy makers, scientists, and …

Lima: Fine-grained lineage tracing and reuse in machine learning systems

A Phani, B Rath, M Boehm - … of the 2021 International Conference on …, 2021 - dl.acm.org
Machine learning (ML) and data science workflows are inherently exploratory. Data
scientists pose hypotheses, integrate the necessary data, and run ML pipelines of data …

Towards observability for production machine learning pipelines

S Shankar, A Parameswaran - arXiv preprint arXiv:2108.13557, 2021 - arxiv.org
Software organizations are increasingly incorporating machine learning (ML) into their
product offerings, driving a need for new data management tools. Many of these tools …

Workflow provenance in the lifecycle of scientific machine learning

R Souza, LG Azevedo, V Lourenço… - Concurrency and …, 2022 - Wiley Online Library
Abstract Machine learning (ML) has already fundamentally changed several businesses.
More recently, it has also been profoundly impacting the computational science and …