Data validation for machine learning

A Paleyes, RG Urma, ND Lawrence - ACM computing surveys, 2022 - dl.acm.org

In recent years, machine learning has transitioned from a field of academic research interest
to a field capable of solving real-world business problems. However, the deployment of …

被引用次数：543 相关文章所有 6 个版本

Overview and importance of data quality for machine learning tasks

A Jain, H Patel, L Nagalapatti, N Gupta… - Proceedings of the 26th …, 2020 - dl.acm.org

It is well understood from literature that the performance of a machine learning (ML) model is
upper bounded by the quality of the data. While researchers and practitioners have focused …

被引用次数：330 相关文章所有 2 个版本

“Everyone wants to do the model work, not the data work”: Data Cascades in High-Stakes AI

N Sambasivan, S Kapania, H Highfill… - proceedings of the …, 2021 - dl.acm.org

AI models are increasingly applied in high-stakes domains like health and conservation.
Data quality carries an elevated significance in high-stakes AI due to its heightened …

被引用次数：861 相关文章

[PDF] arxiv.org

Data collection and quality challenges in deep learning: A data-centric ai perspective

SE Whang, Y Roh, H Song, JG Lee - The VLDB Journal, 2023 - Springer

Data-centric AI is at the center of a fundamental shift in software engineering where machine
learning becomes the new software, powered by big data and computing infrastructure …

被引用次数：381 相关文章所有 6 个版本

Bridging the gap between ethics and practice: guidelines for reliable, safe, and trustworthy human-centered AI systems

B Shneiderman - ACM Transactions on Interactive Intelligent Systems …, 2020 - dl.acm.org

This article attempts to bridge the gap between widely discussed ethical principles of Human-
centered AI (HCAI) and practical steps for effective governance. Since HCAI systems are …

被引用次数：658 相关文章

[PDF] usenix.org

Oort: Efficient federated learning via guided participant selection

F Lai, X Zhu, HV Madhyastha… - 15th {USENIX} Symposium …, 2021 - usenix.org

Federated Learning (FL) is an emerging direction in distributed machine learning (ML) that
enables in-situ model training and testing on edge data. Despite having the same end goals …

被引用次数：465 相关文章所有 16 个版本

[PDF] arxiv.org

Machine learning testing: Survey, landscapes and horizons

JM Zhang, M Harman, L Ma… - IEEE Transactions on …, 2020 - ieeexplore.ieee.org

This paper provides a comprehensive survey of techniques for testing machine learning
systems; Machine Learning Testing (ML testing) research. It covers 144 papers on testing …

被引用次数：992 相关文章所有 14 个版本

[PDF] arxiv.org

Software engineering for AI-based systems: a survey

S Martínez-Fernández, J Bogner, X Franch… - ACM Transactions on …, 2022 - dl.acm.org

AI-based systems are software systems with functionalities enabled by at least one AI
component (eg, for image-, speech-recognition, and autonomous driving). AI-based systems …

被引用次数：256 相关文章所有 10 个版本

[PDF] cmu.edu

[PDF][PDF] Lakehouse: a new generation of open platforms that unify data warehousing and advanced analytics

M Armbrust, A Ghodsi, R Xin… - Proceedings of …, 2021 - 15721.courses.cs.cmu.edu

This paper argues that the data warehouse architecture as we know it today will wither in the
coming years and be replaced by a new architectural pattern, the Lakehouse, which will (i) …

被引用次数：300 相关文章所有 8 个版本

[PDF] acm.org

Collaboration challenges in building ml-enabled systems: Communication, documentation, engineering, and process

N Nahar, S Zhou, G Lewis, C Kästner - Proceedings of the 44th …, 2022 - dl.acm.org

The introduction of machine learning (ML) components in software projects has created the
need for software engineers to collaborate with data scientists and other specialists. While …

被引用次数：161 相关文章所有 9 个版本

高级搜索

QQ 群