Data lifecycle challenges in production machine learning: a survey

N Polyzotis, S Roy, SE Whang, M Zinkevich - ACM SIGMOD Record, 2018 - dl.acm.org
Machine learning has become an essential tool for gleaning knowledge from data and
tackling a diverse set of computationally hard tasks. However, the accuracy of a machine …

Whither automl? understanding the role of automation in machine learning workflows

D Xin, EY Wu, DJL Lee, N Salehi… - Proceedings of the 2021 …, 2021 - dl.acm.org
Efforts to make machine learning more widely accessible have led to a rapid increase in
Auto-ML tools that aim to automate the process of training and deploying machine learning …

Photon: A fast query engine for lakehouse systems

A Behm, S Palkar, U Agarwal, T Armstrong… - Proceedings of the …, 2022 - dl.acm.org
Many organizations are shifting to a data management paradigm called the" Lakehouse,"
which implements the functionality of structured data warehouses on top of unstructured …

A berkeley view of systems challenges for ai

I Stoica, D Song, RA Popa, D Patterson… - arXiv preprint arXiv …, 2017 - arxiv.org
With the increasing commoditization of computer vision, speech recognition and machine
translation systems and the widespread deployment of learning-based back-end …

[PDF][PDF] Ray rllib: A composable and scalable reinforcement learning library

E Liang, R Liaw, R Nishihara, P Moritz, R Fox… - arXiv preprint arXiv …, 2017 - royf.org
Reinforcement learning (RL) algorithms involve the deep nesting of distinct components,
where each component typically exhibits opportunities for distributed computation. Current …

Everything you always wanted to know about compiled and vectorized queries but were afraid to ask

T Kersten, V Leis, A Kemper, T Neumann… - Proceedings of the …, 2018 - dl.acm.org
The query engines of most modern database systems are either based on vectorization or
data-centric code generation. These two state-of-the-art query processing paradigms are …

[HTML][HTML] Tidy Tuples and Flying Start: fast compilation and fast execution of relational queries in Umbra

T Kersten, V Leis, T Neumann - The VLDB Journal, 2021 - Springer
Although compiling queries to efficient machine code has become a common approach for
query execution, a number of newly created database system projects still refrain from using …

The nebulastream platform: Data and application management for the internet of things

S Zeuch, A Chaudhary, B Del Monte… - arXiv preprint arXiv …, 2019 - arxiv.org
The Internet of Things (IoT) presents a novel computing architecture for data management: a
distributed, highly dynamic, and heterogeneous environment of massive scale. Applications …

{DeepCPU}: Serving {RNN-based} Deep Learning Models 10x Faster

M Zhang, S Rajbhandari, W Wang, Y He - 2018 USENIX Annual …, 2018 - usenix.org
Recurrent neural networks (RNNs) are an important class of deep learning (DL) models.
Existing DL frameworks have unsatisfying performance for online serving: many RNN …

Designing an open framework for query optimization and compilation

M Jungmair, A Kohn, J Giceva - Proceedings of the VLDB Endowment, 2022 - dl.acm.org
Since its invention, data-centric code generation has been adopted for query compilation by
various database systems in academia and industry. These database systems are fast but …