Big data analytics on Apache Spark

S Salloum, R Dautov, X Chen, PX Peng… - International Journal of …, 2016 - Springer
Apache Spark has emerged as the de facto framework for big data analytics with its
advanced in-memory programming model and upper-level libraries for scalable machine …

Data management in machine learning: Challenges, techniques, and systems

A Kumar, M Boehm, J Yang - Proceedings of the 2017 ACM International …, 2017 - dl.acm.org
Large-scale data analytics using statistical machine learning (ML), popularly called
advanced analytics, underpins many modern data-driven applications. The data …

Federated machine learning: Concept and applications

Q Yang, Y Liu, T Chen, Y Tong - ACM Transactions on Intelligent …, 2019 - dl.acm.org
Today's artificial intelligence still faces two major challenges. One is that, in most industries,
data exists in the form of isolated islands. The other is the strengthening of data privacy and …

Towards demystifying serverless machine learning training

J Jiang, S Gan, Y Liu, F Wang, G Alonso… - Proceedings of the …, 2021 - dl.acm.org
The appeal of serverless (FaaS) has triggered a growing interest on how to use it in data-
intensive applications such as ETL, query processing, or machine learning (ML). Several …

Systemml: Declarative machine learning on spark

M Boehm, MW Dusenberry, D Eriksson… - Proceedings of the …, 2016 - dl.acm.org
The rising need for custom machine learning (ML) algorithms and the growing data sizes
that require the exploitation of distributed, data-parallel frameworks such as MapReduce or …

A survey on large-scale machine learning

M Wang, W Fu, X He, S Hao… - IEEE Transactions on …, 2020 - ieeexplore.ieee.org
Machine learning can provide deep insights into data, allowing machines to make high-
quality predictions and having been widely used in real-world applications, such as text …

Learning linear regression models over factorized joins

M Schleich, D Olteanu, R Ciucanu - Proceedings of the 2016 …, 2016 - dl.acm.org
We investigate the problem of building least squares regression models over training
datasets defined by arbitrary join queries on database tables. Our key observation is that …

Probabilistic demand forecasting at scale

JH Böse, V Flunkert, J Gasthaus… - Proceedings of the …, 2017 - dl.acm.org
We present a platform built on large-scale, data-centric machine learning (ML) approaches,
whose particular focus is demand forecasting in retail. At its core, this platform enables the …

Cerebro: A data system for optimized deep learning model selection

S Nakandala, Y Zhang, A Kumar - Proceedings of the VLDB Endowment, 2020 - dl.acm.org
Deep neural networks (deep nets) are revolutionizing many machine learning (ML)
applications. But there is a major bottleneck to wider adoption: the pain and resource …

Sliceline: Fast, linear-algebra-based slice finding for ml model debugging

S Sagadeeva, M Boehm - … of the 2021 international conference on …, 2021 - dl.acm.org
Slice finding---a recent work on debugging machine learning (ML) models---aims to find the
top-K data slices (eg, conjunctions of predicates such as gender female and degree PhD) …