Data management in machine learning: Challenges, techniques, and systems

A Kumar, M Boehm, J Yang - Proceedings of the 2017 ACM International …, 2017 - dl.acm.org
Large-scale data analytics using statistical machine learning (ML), popularly called
advanced analytics, underpins many modern data-driven applications. The data …

A survey of data partitioning and sampling methods to support big data analysis

MS Mahmud, JZ Huang, S Salloum… - Big Data Mining and …, 2020 - ieeexplore.ieee.org
Computer clusters with the shared-nothing architecture are the major computing platforms
for big data processing and analysis. In cluster computing, data partitioning and sampling …

[图书][B] Data cleaning

IF Ilyas, X Chu - 2019 - books.google.com
This is an overview of the end-to-end data cleaning process. Data quality is one of the most
important problems in data management, since dirty data often leads to inaccurate data …

Noscope: optimizing neural network queries over video at scale

D Kang, J Emmons, F Abuzaid, P Bailis… - arXiv preprint arXiv …, 2017 - arxiv.org
Recent advances in computer vision-in the form of deep neural networks-have made it
possible to query increasing volumes of video data with high accuracy. However, neural …

Automated machine learning: State-of-the-art and open challenges

R Elshawi, M Maher, S Sakr - arXiv preprint arXiv:1906.02287, 2019 - arxiv.org
With the continuous and vast increase in the amount of data in our digital world, it has been
acknowledged that the number of knowledgeable data scientists can not scale to address …

Focus: Querying large video datasets with low latency and low cost

K Hsieh, G Ananthanarayanan, P Bodik… - … USENIX Symposium on …, 2018 - usenix.org
Large volumes of videos are continuously recorded from cameras deployed for traffic control
and surveillance with the goal of answering “after the fact” queries: identify video frames with …

Data lifecycle challenges in production machine learning: a survey

N Polyzotis, S Roy, SE Whang, M Zinkevich - ACM SIGMOD Record, 2018 - dl.acm.org
Machine learning has become an essential tool for gleaning knowledge from data and
tackling a diverse set of computationally hard tasks. However, the accuracy of a machine …

C-store: a column-oriented DBMS

M Stonebraker, DJ Abadi, A Batkin, X Chen… - … Databases Work: the …, 2018 - dl.acm.org
This paper presents the design of a read-optimized relational DBMS that contrasts sharply
with most current systems, which are write-optimized. Among the many differences in its …

A berkeley view of systems challenges for ai

I Stoica, D Song, RA Popa, D Patterson… - arXiv preprint arXiv …, 2017 - arxiv.org
With the increasing commoditization of computer vision, speech recognition and machine
translation systems and the widespread deployment of learning-based back-end …

Deepaid: Interpreting and improving deep learning-based anomaly detection in security applications

D Han, Z Wang, W Chen, Y Zhong, S Wang… - Proceedings of the …, 2021 - dl.acm.org
Unsupervised Deep Learning (DL) techniques have been widely used in various security-
related anomaly detection applications, owing to the great promise of being able to detect …