Understanding the behavior of in-memory computing workloads

T Jiang, Q Zhang, R Hou, L Chai… - 2014 IEEE …, 2014 - ieeexplore.ieee.org
The increasing demands of big data applications have led researchers and practitioners to
turn to in-memory computing to speed processing. For instance, the Apache Spark …

Performance characterization of in-memory data analytics on a modern cloud server

AJ Awan, M Brorsson, V Vlassov… - 2015 IEEE Fifth …, 2015 - ieeexplore.ieee.org
In last decade, data analytics have rapidly progressed from traditional disk-based
processing to modern in-memory processing. However, little effort has been devoted at …

Towards automatic memory tuning for in-memory big data analytics in clusters

AK Koliopoulos, P Yiapanis, F Tekiner… - … Congress on Big …, 2016 - ieeexplore.ieee.org
Hadoop provides a scalable solution on traditional cluster-based Big Data platforms but
imposes performance overheads due to only supporting on-disk data. Data Analytic …

Micro-architectural characterization of apache spark on batch and stream processing workloads

AJ Awan, M Brorsson, V Vlassov… - 2016 IEEE International …, 2016 - ieeexplore.ieee.org
While cluster computing frameworks are continuously evolving to provide real-time data
analysis capabilities, Apache Spark has managed to be at the forefront of big data analytics …

Memory requirements of hadoop, spark, and MPI based big data applications on commodity server class architectures

HM Makrani, H Homayoun - 2017 IEEE International …, 2017 - ieeexplore.ieee.org
Emerging big data frameworks requires computational resources and memory subsystems
that can naturally scale to manage massive amounts of diverse data. Given the large size …

Main-memory requirements of big data applications on commodity server platform

HM Makrani, S Rafatirad, A Houmansadr… - 2018 18th IEEE/ACM …, 2018 - ieeexplore.ieee.org
The emergence of big data frameworks requires computational and memory resources that
can naturally scale to manage massive amounts of diverse data. It is currently unclear …

Chopper: Optimizing data partitioning for in-memory data analytics frameworks

AK Paul, W Zhuang, L Xu, M Li… - 2016 IEEE …, 2016 - ieeexplore.ieee.org
The performance of in-memory based data analytic frameworks such as Spark is
significantly affected by how data is partitioned. This is because the partitioning effectively …

Spark: A big data processing platform based on memory computing

Z Han, Y Zhang - 2015 Seventh International Symposium on …, 2015 - ieeexplore.ieee.org
Spark is a memory-based computing framework which has a better ability of computing and
fault tolerance, supports batch, interactive, iterative and flow calculations. In this paper, we …

High-performance design of apache spark with RDMA and its benefits on various workloads

X Lu, D Shankar, S Gugnani… - 2016 IEEE International …, 2016 - ieeexplore.ieee.org
The in-memory data processing framework, Apache Spark, has been stealing the limelight
for low-latency interactive applications, iterative and batch computations. Our early …

Performance characterization and acceleration of in-memory file systems for Hadoop and Spark applications on HPC clusters

NS Islam, M Wasi-ur-Rahman, X Lu… - … Conference on Big …, 2015 - ieeexplore.ieee.org
For data-intensive computing, the low throughput of the existing disk-bound storage systems
is a major bottleneck. Recent emergence of the in-memory file systems with heterogeneous …