A survey of big data, high performance computing, and machine learning benchmarks

N Ihde, P Marten, A Eleliemy, G Poerwawinata… - … and Benchmarking: 13th …, 2022 - Springer
In recent years, there has been a convergence of Big Data (BD), High Performance
Computing (HPC), and Machine Learning (ML) systems. This convergence is due to the …

Learning to generate overlap summaries through noisy synthetic data

N Bansal, M Akter, SKK Santu - Proceedings of the 2022 …, 2022 - aclanthology.org
Abstract Semantic Overlap Summarization (SOS) is a novel and relatively under-explored
seq-to-seq task which entails summarizing common information from multiple alternate …

Exploring the applicability of test driven development in the big data domain

D Staegemann, M Volk, N Jamous, K Turowski - 2020 - aisel.aisnet.org
Big data analytics and the according applications have gained huge importance in daily life.
This results on the one hand from their versatility and on the other hand from their capability …

SML-Bench–A benchmarking framework for structured machine learning

P Westphal, L Bühmann, S Bin, H Jabeen… - Semantic …, 2019 - content.iospress.com
The availability of structured data has increased significantly over the past decade and
several approaches to learn from structured data have been proposed. These logic-based …

Big data and HPC collocation: Using HPC idle resources for Big Data analytics

M Mercier, D Glesser, Y Georgiou… - 2017 IEEE International …, 2017 - ieeexplore.ieee.org
Executing Big Data workloads upon High Performance Computing (HPC) infrastractures has
become an attractive way to improve their performances. However, the collocation of HPC …

Hadoop configuration tuning with ensemble modeling and metaheuristic optimization

X Hua, MC Huang, P Liu - IEEE Access, 2018 - ieeexplore.ieee.org
MapReduce is a popular programming model for big data processing. Although the
distributed processing framework Hadoop greatly reduced the development complexity of …

Understanding big data analytics workloads on modern processors

Z Jia, J Zhan, L Wang, C Luo, W Gao… - … on Parallel and …, 2016 - ieeexplore.ieee.org
Big data analytics workloads are very significant ones in modern data centers, and it is more
and more important to characterize their representative workloads and understand their …

Selecting resources for distributed dataflow systems according to runtime targets

L Thamsen, I Verbitskiy, F Schmidt… - 2016 IEEE 35th …, 2016 - ieeexplore.ieee.org
Distributed dataflow systems like Spark or Flink enable users to analyze large datasets.
Users create programs by providing sequential user-defined functions for a set of well …

Trust and big data: a roadmap for research

J Sänger, C Richthammer, S Hassan… - … workshop on database …, 2014 - ieeexplore.ieee.org
We are currently living in the age of Big Data coming along with the challenge to grasp the
golden opportunities at hand. This mixed blessing also dominates the relation between Big …

Aibench scenario: Scenario-distilling ai benchmarking

W Gao, F Tang, J Zhan, X Wen, L Wang… - 2021 30th …, 2021 - ieeexplore.ieee.org
Modern real-world application scenarios like Internet services consist of a diversity of AI and
non-AI modules with huge code sizes and long and complicated execution paths, which …