Cerebro: A data system for optimized deep learning model selection

S Nakandala, Y Zhang, A Kumar - Proceedings of the VLDB Endowment, 2020 - dl.acm.org
Deep neural networks (deep nets) are revolutionizing many machine learning (ML)
applications. But there is a major bottleneck to wider adoption: the pain and resource …

Sliceline: Fast, linear-algebra-based slice finding for ml model debugging

S Sagadeeva, M Boehm - … of the 2021 international conference on …, 2021 - dl.acm.org
Slice finding---a recent work on debugging machine learning (ML) models---aims to find the
top-K data slices (eg, conjunctions of predicates such as gender female and degree PhD) …

Distributed deep learning on data systems: a comparative analysis of approaches

Y Zhang, F Mcquillan, N Jayaram, N Kak… - Proceedings of the …, 2021 - par.nsf.gov
Deep learning (DL) is growing in popularity for many data analytics applications, including
among enterprises. Large business-critical datasets in such settings typically reside in …

SystemDS: A declarative machine learning system for the end-to-end data science lifecycle

M Boehm, I Antonov, S Baunsgaard, M Dokter… - arXiv preprint arXiv …, 2019 - arxiv.org
Machine learning (ML) applications become increasingly common in many domains. ML
systems to execute these workloads include numerical computing frameworks and libraries …

[PDF][PDF] Cerebro: A layered data platform for scalable deep learning

A Kumar, S Nakandala, Y Zhang, S Li… - … Annual Conference on …, 2021 - par.nsf.gov
Deep learning (DL) is gaining popularity across many domains thanks to tools such as
TensorFlow and easier access to GPUs. But building large-scale DL applications is still too …

Few-shot multi-view object classification via dual augmentation network

Y Zhou, H Lu, T Hao, X Li, AA Liu - Information Fusion, 2023 - Elsevier
Existing multi-view object classification algorithms usually rely on sufficient labeled multi-
view objects, which substantially restricts their scalability to novel classes with few annotated …

Exdra: Exploratory data science on federated raw data

S Baunsgaard, M Boehm, A Chaudhary… - Proceedings of the …, 2021 - dl.acm.org
Data science workflows are largely exploratory, dealing with under-specified objectives,
open-ended problems, and unknown business value. Therefore, little investment is made in …

Learning transferable and discriminative representations for 2D image-based 3D model retrieval

Y Zhou, Y Liu, H Zhou, Z Cheng, X Li… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
Existing research on the 2D image-based 3D model retrieval task focuses on learning
transferable representations directly to narrow the domain discrepancy. However, it is not …

Model averaging in distributed machine learning: a case study with Apache Spark

Y Guo, Z Zhang, J Jiang, W Wu, C Zhang, B Cui, J Li - The VLDB Journal, 2021 - Springer
The increasing popularity of Apache Spark has attracted many users to put their data into its
ecosystem. On the other hand, it has been witnessed in the literature that Spark is slow …

AWARE: Workload-aware, Redundancy-exploiting Linear Algebra

S Baunsgaard, M Boehm - Proceedings of the ACM on Management of …, 2023 - dl.acm.org
Compression is an effective technique for fitting data in available memory, reducing I/O, and
increasing instruction parallelism. While data systems primarily rely on lossless …