Big data systems meet machine learning challenges: towards big data science as a service

R Elshawi, S Sakr, D Talia, P Trunfio - Big data research, 2018 - Elsevier
Recently, we have been witnessing huge advancements in the scale of data we routinely
generate and collect in pretty much everything we do, as well as our ability to exploit modern …

Accelerating human-in-the-loop machine learning: Challenges and opportunities

D Xin, L Ma, J Liu, S Macke, S Song… - Proceedings of the …, 2018 - dl.acm.org
Development of machine learning (ML) workflows is a tedious process of iterative
experimentation: developers repeatedly make changes to workflows until the desired …

Data validation for machine learning

N Polyzotis, M Zinkevich, S Roy… - … of machine learning …, 2019 - proceedings.mlsys.org
Abstract Machine learning is a powerful tool for gleaning knowledge from massive amounts
of data. While a great deal of machine learning research has focused on improving the …

Tfx: A tensorflow-based production-scale machine learning platform

D Baylor, E Breck, HT Cheng, N Fiedel… - Proceedings of the 23rd …, 2017 - dl.acm.org
Creating and maintaining a platform for reliably producing and deploying machine learning
models requires careful orchestration of many components---a learner for generating …

Towards demystifying serverless machine learning training

J Jiang, S Gan, Y Liu, F Wang, G Alonso… - Proceedings of the …, 2021 - dl.acm.org
The appeal of serverless (FaaS) has triggered a growing interest on how to use it in data-
intensive applications such as ETL, query processing, or machine learning (ML). Several …

Automating large-scale data quality verification

S Schelter, D Lange, P Schmidt, M Celikel… - Proceedings of the …, 2018 - dl.acm.org
Modern companies and institutions rely on data to guide every single business process and
decision. Missing or incorrect information seriously compromises any decision process …

Data lifecycle challenges in production machine learning: a survey

N Polyzotis, S Roy, SE Whang, M Zinkevich - ACM SIGMOD Record, 2018 - dl.acm.org
Machine learning has become an essential tool for gleaning knowledge from data and
tackling a diverse set of computationally hard tasks. However, the accuracy of a machine …

The art and practice of data science pipelines: A comprehensive study of data science pipelines in theory, in-the-small, and in-the-large

S Biswas, M Wardat, H Rajan - … of the 44th International Conference on …, 2022 - dl.acm.org
Increasingly larger number of software systems today are including data science
components for descriptive, predictive, and prescriptive analytics. The collection of data …

Pseudorandom sets in grassmann graph have near-perfect expansion

K Subhash, D Minzer, M Safra - 2018 IEEE 59th Annual …, 2018 - ieeexplore.ieee.org
We prove that pseudorandom sets in the Grassmann graph have near-perfect expansion.
This completes the last missing piece of the proof of the 2-to-2-Games Conjecture (albeit …

End-to-end optimization of machine learning prediction queries

K Park, K Saur, D Banda, R Sen, M Interlandi… - Proceedings of the …, 2022 - dl.acm.org
Prediction queries are widely used across industries to perform advanced analytics and
draw insights from data. They include a data processing part (eg, for joining, filtering …