Data management in machine learning: Challenges, techniques, and systems

A Kumar, M Boehm, J Yang - Proceedings of the 2017 ACM International …, 2017 - dl.acm.org
Large-scale data analytics using statistical machine learning (ML), popularly called
advanced analytics, underpins many modern data-driven applications. The data …

Riotbench: An iot benchmark for distributed stream processing systems

A Shukla, S Chaturvedi… - … and Computation: Practice …, 2017 - Wiley Online Library
Summary The Internet of Things (IoT) is an emerging technology paradigm where millions of
sensors and actuators help monitor and manage physical, environmental, and human …

A comprehensive study and review of tuning the performance on database scalability in big data analytics

MR Sundarakumar, G Mahadevan… - Journal of Intelligent …, 2023 - content.iospress.com
In the modern era, digital data processing with a huge volume of data from the repository is
challenging due to various data formats and the extraction techniques available. The …

Adaptive code learning for spark configuration tuning

C Lin, J Zhuang, J Feng, H Li… - 2022 IEEE 38th …, 2022 - ieeexplore.ieee.org
Configuration tuning is vital to optimize the performance of big data analysis platforms like
Spark. Existing methods (eg auto-tuning relational databases) are not effective for tuning …

SPBench: a framework for creating benchmarks of stream processing applications

AM Garcia, D Griebler, C Schepke, LG Fernandes - Computing, 2023 - Springer
In a fast-changing data-driven world, real-time data processing systems are becoming
ubiquitous in everyday applications. The increasing data we produce, such as audio, video …

Benchmarking distributed stream processing platforms for iot applications

A Shukla, Y Simmhan - … and Benchmarking. Traditional-Big Data-Internet …, 2017 - Springer
Abstract Internet of Things (IoT) is a technology paradigm where millions of sensors monitor,
and help inform or manage, physical, environmental and human systems in real-time. The …

UPLIFT: parallelization strategies for feature transformations in machine learning workloads

A Phani, L Erlbacher, M Boehm - Proceedings of the VLDB Endowment, 2022 - dl.acm.org
Data science pipelines are typically exploratory. An integral task of such pipelines are
feature transformations, which transform raw data into numerical matrices or tensors for …

SparkBench: a spark benchmarking suite characterizing large-scale in-memory data analytics

M Li, J Tan, Y Wang, L Zhang, V Salapura - Cluster Computing, 2017 - Springer
Spark has been increasingly employed by industries for big data analytics recently, due to its
resilience, scalability and efficient in-memory distributed programming model. Meanwhile …

Bigbench v2: The new and improved bigbench

A Ghazal, T Ivanov, P Kostamaa… - 2017 IEEE 33rd …, 2017 - ieeexplore.ieee.org
Benchmarking Big Data solutions has been gaining a lot of attention from research and
industry. BigBench is one of the most popular benchmarks in this area which was adopted …

Declarative machine learning-a classification of basic properties and types

M Boehm, AV Evfimievski, N Pansare… - arXiv preprint arXiv …, 2016 - arxiv.org
Declarative machine learning (ML) aims at the high-level specification of ML tasks or
algorithms, and automatic generation of optimized execution plans from these specifications …