Data management in machine learning: Challenges, techniques, and systems

A Kumar, M Boehm, J Yang - Proceedings of the 2017 ACM International …, 2017 - dl.acm.org
Large-scale data analytics using statistical machine learning (ML), popularly called
advanced analytics, underpins many modern data-driven applications. The data …

The case for learned index structures

T Kraska, A Beutel, EH Chi, J Dean… - Proceedings of the 2018 …, 2018 - dl.acm.org
Indexes are models: a\btree-Index can be seen as a model to map a key to the position of a
record within a sorted array, a Hash-Index as a model to map a key to a position of a record …

Prescriptive analytics: a survey of emerging trends and technologies

D Frazzetto, TD Nielsen, TB Pedersen, L Šikšnys - The VLDB Journal, 2019 - Springer
This paper provides a survey of the state-of-the-art and future directions of one of the most
important emerging technologies within business analytics (BA), namely prescriptive …

Systemml: Declarative machine learning on spark

M Boehm, MW Dusenberry, D Eriksson… - Proceedings of the …, 2016 - dl.acm.org
The rising need for custom machine learning (ML) algorithms and the growing data sizes
that require the exploitation of distributed, data-parallel frameworks such as MapReduce or …

The end of slow networks: It's time for a redesign

C Binnig, A Crotty, A Galakatos, T Kraska… - arXiv preprint arXiv …, 2015 - arxiv.org
Next generation high-performance RDMA-capable networks will require a fundamental
rethinking of the design and architecture of modern distributed DBMSs. These systems are …

Weld: A common runtime for high performance data analytics

S Palkar, JJ Thomas, A Shanbhag, D Narayanan… - 2017 - dspace.mit.edu
© 2017 Conference on Innovative Data Systems Research (CIDR). All rights reserved.
Modern analytics applications combine multiple functions from different libraries and …

Relaxed operator fusion for in-memory databases: Making compilation, vectorization, and prefetching work together at last

P Menon, TC Mowry, A Pavlo - Proceedings of the VLDB Endowment, 2017 - dl.acm.org
In-memory database management systems (DBMSs) are a key component of modern on-
line analytic processing (OLAP) applications, since they provide low-latency access to large …

{PRETZEL}: Opening the black box of machine learning prediction serving systems

Y Lee, A Scolari, BG Chun, MD Santambrogio… - … USENIX Symposium on …, 2018 - usenix.org
Machine Learning models are often composed of pipelines of transformations. While this
design allows to efficiently execute single model components at training time, prediction …

Northstar: An interactive data science system

T Kraska - 2021 - dspace.mit.edu
© 2018 VLDB Endowment. In order to democratize data science, we need to fundamentally
rethink the current analytics stack, from the user interface to the “guts.“Most importantly …

Everything you always wanted to know about compiled and vectorized queries but were afraid to ask

T Kersten, V Leis, A Kemper, T Neumann… - Proceedings of the …, 2018 - dl.acm.org
The query engines of most modern database systems are either based on vectorization or
data-centric code generation. These two state-of-the-art query processing paradigms are …