hats: A heterogeneity-aware tiered storage for hadoop

KR Krish, A Anwar, AR Butt - 2014 14th IEEE/ACM …, 2014 - ieeexplore.ieee.org
Hadoop has become the de-facto large-scale data processing framework for modern
analytics applications. A major obstacle for sustaining high performance and scalability in …

OctopusFS: A distributed file system with tiered storage management

E Kakoulli, H Herodotou - Proceedings of the 2017 acm international …, 2017 - dl.acm.org
The ever-growing data storage and I/O demands of modern large-scale data analytics are
challenging the current distributed storage systems. A promising trend is to exploit the recent …

Automating distributed tiered storage management in cluster computing

H Herodotou, E Kakoulli - arXiv preprint arXiv:1907.02394, 2019 - arxiv.org
Data-intensive platforms such as Hadoop and Spark are routinely used to process massive
amounts of data residing on distributed file systems like HDFS. Increasing memory sizes and …

Cast: Tiering storage for data analytics in the cloud

Y Cheng, MS Iqbal, A Gupta, AR Butt - Proceedings of the 24th …, 2015 - dl.acm.org
Enterprises are increasingly moving their big data analytics to the cloud with the goal of
reducing costs without sacrificing application performance. Cloud service providers offer …

High-performance design of YARN MapReduce on modern HPC clusters with Lustre and RDMA

M Wasi-ur-Rahman, X Lu, NS Islam… - 2015 IEEE …, 2015 - ieeexplore.ieee.org
The viability and benefits of running MapReduce over modern High Performance Computing
(HPC) clusters, with high performance interconnects and parallel file systems, have attracted …

On efficient hierarchical storage for big data processing

KR Krish, B Wadhwa, MS Iqbal… - 2016 16th IEEE/ACM …, 2016 - ieeexplore.ieee.org
A promising trend in storage management for big data frameworks, such as Hadoop and
Spark, is the emergence of heterogeneous and hybrid storage systems that employ different …

A comprehensive study of MapReduce over lustre for intermediate data placement and shuffle strategies on HPC clusters

MD Wasi-ur-Rahman, NS Islam, X Lu… - IEEE Transactions on …, 2016 - ieeexplore.ieee.org
With high performance interconnects and parallel file systems, running MapReduce over
modern High Performance Computing (HPC) clusters has attracted much attention due to its …

SwiftAnalytics: Optimizing object storage for big data analytics

L Rupprecht, R Zhang, B Owen… - 2017 IEEE …, 2017 - ieeexplore.ieee.org
Due to their scalability and low cost, object-based storage systems are an attractive storage
solution and widely deployed. To gain valuable insight from the data residing in object …

Too big to eat: Boosting analytics data ingestion from object stores with scoop

Y Moatti, E Rom, R Gracia-Tinedo… - 2017 IEEE 33rd …, 2017 - ieeexplore.ieee.org
Extracting value from data stored in object stores, such as OpenStack Swift and Amazon S3,
can be problematicin common scenarios where analytics frameworks and objectstores run …

[phi] sched: A heterogeneity-aware hadoop workflow scheduler

KR Krish, A Anwar, AR Butt - 2014 IEEE 22nd International …, 2014 - ieeexplore.ieee.org
Enterprise Hadoop applications now routinely comprise complex workflows that are
managed by specialized workflow schedulers such as Oozie. The resources are assumed to …