Cross-platform resource scheduling for spark and mapreduce on yarn

D Cheng, X Zhou, P Lama, J Wu… - IEEE Transactions on …, 2017 - ieeexplore.ieee.org
While MapReduce is inherently designed for batch and high throughput processing
workloads, there is an increasing demand for non-batch processes on big data, eg …

HPC AI500: a benchmark suite for HPC AI systems

Z Jiang, W Gao, L Wang, X Xiong, Y Zhang… - … , and Optimizing: First …, 2019 - Springer
In recent years, with the trend of applying deep learning (DL) in high performance scientific
computing, the unique characteristics of emerging DL workloads in HPC raise great …

Wise: Predicting the performance of sparse matrix vector multiplication with machine learning

S Yesil, A Heidarshenas, A Morrison… - Proceedings of the 28th …, 2023 - dl.acm.org
Sparse Matrix-Vector Multiplication (SpMV) is an essential sparse kernel. Numerous
methods have been developed to accelerate SpMV. However, no single method consistently …

vbench: Benchmarking video transcoding in the cloud

A Lottarini, A Ramirez, J Coburn, MA Kim… - ACM SIGPLAN …, 2018 - dl.acm.org
This paper presents vbench, a publicly available benchmark for cloud video services. We
are the first study, to the best of our knowledge, to characterize the emerging video-as-a …

Model-driven scheduling for distributed stream processing systems

A Shukla, Y Simmhan - Journal of Parallel and Distributed Computing, 2018 - Elsevier
Abstract Distributed Stream Processing Systems (DSPS) are “Fast Data” platforms that allow
streaming applications to be composed and executed with low latency on commodity …

An efficient industrial big-data engine

P Basanta-Val - IEEE Transactions on Industrial Informatics, 2017 - ieeexplore.ieee.org
Current trends in industrial systems opt for the use of different big-data engines as a means
to process huge amounts of data that cannot be processed with an ordinary infrastructure …

Data motifs: A lens towards fully understanding big data and ai workloads

W Gao, J Zhan, L Wang, C Luo, D Zheng… - Proceedings of the 27th …, 2018 - dl.acm.org
The complexity and diversity of big data and AI workloads make understanding them difficult
and challenging. This paper proposes a new approachto modelling and characterizing big …

An investigation of performance analysis of anomaly detection techniques for big data in scada systems

M Ahmed, A Anwar, AN Mahmood… - EAI Endorsed …, 2015 - publications.eai.eu
Anomaly detection is an important aspect of data mining, where the main objective is to
identify anomalous or unusual data from a given dataset. However, there is no formal …

Hypart: A hybrid technique for practical memory bandwidth partitioning on commodity servers

J Park, S Park, M Han, J Hyun, W Baek - Proceedings of the 27th …, 2018 - dl.acm.org
Memory bandwidth is a highly performance-critical shared resource on modern computer
systems. To prevent the contention on memory bandwidth among the collocated workloads …

Big-data NoSQL databases: A comparison and analysis of “Big-Table”,“DynamoDB”, and “Cassandra”

S Kalid, A Syed, A Mohammad… - 2017 IEEE 2nd …, 2017 - ieeexplore.ieee.org
The growth and enhancement of technology in the corporate society has led to data storage
and confidentiality issues. The problem arises from the management of trillions of data …