Data management in machine learning: Challenges, techniques, and systems

A Kumar, M Boehm, J Yang - Proceedings of the 2017 ACM International …, 2017 - dl.acm.org
Large-scale data analytics using statistical machine learning (ML), popularly called
advanced analytics, underpins many modern data-driven applications. The data …

[HTML][HTML] Performance evaluation of regression models for COVID-19: A statistical and predictive perspective

MA Khan, R Khan, F Algarni, I Kumar… - Ain Shams Engineering …, 2022 - Elsevier
Research is very important in the pandemic situation of COVID-19 to deliver a speedy
solution to this problem. COVID-19 has presented governments, corporations and ordinary …

Random sampling over joins revisited

Z Zhao, R Christensen, F Li, X Hu, K Yi - Proceedings of the 2018 …, 2018 - dl.acm.org
Joins are expensive, especially on large data and/or multiple relations. One promising
approach in mitigating their high costs is to just return a simple random sample of the full join …

Data management for machine learning: A survey

C Chai, J Wang, Y Luo, Z Niu… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
Machine learning (ML) has widespread applications and has revolutionized many
industries, but suffers from several challenges. First, sufficient high-quality training data is …

SPORES: sum-product optimization via relational equality saturation for large scale linear algebra

YR Wang, S Hutchison, J Leang, B Howe… - arXiv preprint arXiv …, 2020 - arxiv.org
Machine learning algorithms are commonly specified in linear algebra (LA). LA expressions
can be rewritten into more efficient forms, by taking advantage of input properties such as …

Data lakes: A survey of functions and systems

R Hai, C Koutras, C Quix… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Data lakes are becoming increasingly prevalent for Big Data management and data
analytics. In contrast to traditional 'schema-on-write'approaches such as data warehouses …

Research directions for principles of data management (dagstuhl perspectives workshop 16151)

S Abiteboul, M Arenas, P Barceló, M Bienvenu… - 2018 - drops.dagstuhl.de
The area of Principles of Data Management (PDM) has made crucial contributions to the
development of formal frameworks for understanding and managing data and knowledge …

Dbest: Revisiting approximate query processing engines with machine learning models

Q Ma, P Triantafillou - Proceedings of the 2019 International Conference …, 2019 - dl.acm.org
In the era of big data, computing exact answers to analytical queries becomes prohibitively
expensive. This greatly increases the value of approaches that can compute efficiently …

SystemDS: A declarative machine learning system for the end-to-end data science lifecycle

M Boehm, I Antonov, S Baunsgaard, M Dokter… - arXiv preprint arXiv …, 2019 - arxiv.org
Machine learning (ML) applications become increasingly common in many domains. ML
systems to execute these workloads include numerical computing frameworks and libraries …

Factorized databases

D Olteanu, M Schleich - ACM SIGMOD Record, 2016 - dl.acm.org
This paper overviews factorized databases and their application to machine learning. The
key observation underlying this work is that state-of-the-art relational query processing …