A survey of data partitioning and sampling methods to support big data analysis

MS Mahmud, JZ Huang, S Salloum… - Big Data Mining and …, 2020 - ieeexplore.ieee.org
Computer clusters with the shared-nothing architecture are the major computing platforms
for big data processing and analysis. In cluster computing, data partitioning and sampling …

Sprocket: A serverless video processing framework

L Ao, L Izhikevich, GM Voelker, G Porter - Proceedings of the ACM …, 2018 - dl.acm.org
Sprocket is a highly configurable, stage-based, scalable, serverless video processing
framework that exploits intra-video parallelism to achieve low latency. Sprocket enables …

Effective straggler mitigation: Attack of the clones

G Ananthanarayanan, A Ghodsi, S Shenker… - … USENIX Symposium on …, 2013 - usenix.org
Small jobs, that are typically run for interactive data analyses in datacenters, continue to be
plagued by disproportionately long-running tasks called stragglers. In the production …

Skewtune: mitigating skew in mapreduce applications

YC Kwon, M Balazinska, B Howe, J Rolia - Proceedings of the 2012 …, 2012 - dl.acm.org
We present an automatic skew mitigation approach for user-defined MapReduce programs
and present SkewTune, a system that implements this approach as a drop-in replacement …

Hawk: Hybrid datacenter scheduling

P Delgado, F Dinu, AM Kermarrec… - 2015 USENIX Annual …, 2015 - usenix.org
Hawk: Hybrid Datacenter Scheduling Page 1 This paper is included in the Proceedings of
the 2015 USENIX Annual Technical Conference (USENIC ATC ’15). July 8–10, 2015 • …

Improving MapReduce performance using smart speculative execution strategy

Q Chen, C Liu, Z Xiao - IEEE Transactions on Computers, 2013 - ieeexplore.ieee.org
MapReduce is a widely used parallel computing framework for large scale data processing.
The two major performance metrics in MapReduce are job execution time and cluster …

Health big data analytics: A technology survey

G Harerimana, B Jang, JW Kim, HK Park - Ieee Access, 2018 - ieeexplore.ieee.org
Because of the vast availability of data, there has been an additional focus on the health
industry and an increasing number of studies that aim to leverage the data to improve …

{GRASS}: Trimming stragglers in approximation analytics

G Ananthanarayanan, MCC Hung, X Ren… - … USENIX symposium on …, 2014 - usenix.org
In big data analytics, timely results, even if based on only part of the data, are often good
enough. For this reason, approximation jobs, which have deadline or error bounds and …

Locality-aware reduce task scheduling for MapReduce

M Hammoud, MF Sakr - 2011 IEEE Third International …, 2011 - ieeexplore.ieee.org
MapReduce offers a promising programming model for big data processing. Inspired by
functional languages, MapReduce allows programmers to write functional-style code which …

Caerus:{NIMBLE} task scheduling for serverless analytics

H Zhang, Y Tang, A Khandelwal, J Chen… - 18th USENIX Symposium …, 2021 - usenix.org
Serverless platforms facilitate transparent resource elasticity and fine-grained billing, making
them an attractive choice for data analytics. We find that while server-centric analytics …