“Everyone wants to do the model work, not the data work”: Data Cascades in High-Stakes AI

N Sambasivan, S Kapania, H Highfill… - proceedings of the …, 2021 - dl.acm.org
AI models are increasingly applied in high-stakes domains like health and conservation.
Data quality carries an elevated significance in high-stakes AI due to its heightened …

Data collection and quality challenges in deep learning: A data-centric ai perspective

SE Whang, Y Roh, H Song, JG Lee - The VLDB Journal, 2023 - Springer
Data-centric AI is at the center of a fundamental shift in software engineering where machine
learning becomes the new software, powered by big data and computing infrastructure …

Machine learning testing: Survey, landscapes and horizons

JM Zhang, M Harman, L Ma… - IEEE Transactions on …, 2020 - ieeexplore.ieee.org
This paper provides a comprehensive survey of techniques for testing machine learning
systems; Machine Learning Testing (ML testing) research. It covers 144 papers on testing …

Software engineering challenges for machine learning applications: A literature review

F Kumeno - Intelligent Decision Technologies, 2019 - content.iospress.com
Abstract Machine learning techniques, especially deep learning, have achieved remarkable
breakthroughs over the past decade. At present, machine learning applications are …

A survey on data collection for machine learning: a big data-ai integration perspective

Y Roh, G Heo, SE Whang - IEEE Transactions on Knowledge …, 2019 - ieeexplore.ieee.org
Data collection is a major bottleneck in machine learning and an active research topic in
multiple communities. There are largely two reasons data collection has recently become a …

Software engineering for machine learning: A case study

S Amershi, A Begel, C Bird, R DeLine… - 2019 IEEE/ACM 41st …, 2019 - ieeexplore.ieee.org
Recent advances in machine learning have stimulated widespread interest within the
Information Technology sector on integrating AI capabilities into software and services. This …

Mitigating bias in radiology machine learning: 1. Data handling

P Rouzrokh, B Khosravi, S Faghani… - Radiology: Artificial …, 2022 - pubs.rsna.org
Minimizing bias is critical to adoption and implementation of machine learning (ML) in
clinical practice. Systematic mathematical biases produce consistent and reproducible …

Virtual homogeneity learning: Defending against data heterogeneity in federated learning

Z Tang, Y Zhang, S Shi, X He… - … on Machine Learning, 2022 - proceedings.mlr.press
In federated learning (FL), model performance typically suffers from client drift induced by
data heterogeneity, and mainstream works focus on correcting client drift. We propose a …

Collaboration challenges in building ml-enabled systems: Communication, documentation, engineering, and process

N Nahar, S Zhou, G Lewis, C Kästner - Proceedings of the 44th …, 2022 - dl.acm.org
The introduction of machine learning (ML) components in software projects has created the
need for software engineers to collaborate with data scientists and other specialists. While …

Data validation for machine learning

N Polyzotis, M Zinkevich, S Roy… - … of machine learning …, 2019 - proceedings.mlsys.org
Abstract Machine learning is a powerful tool for gleaning knowledge from massive amounts
of data. While a great deal of machine learning research has focused on improving the …