Mrtuner: a toolkit to enable holistic optimization for mapreduce jobs

H Herodotou, Y Chen, J Lu - ACM Computing Surveys (CSUR), 2020 - dl.acm.org

Big data processing systems (eg, Hadoop, Spark, Storm) contain a vast number of
configuration parameters controlling parallelism, I/O behavior, memory settings, and …

被引用次数：113 相关文章所有 10 个版本

[PDF] usenix.org

Ernest: Efficient performance prediction for {Large-Scale} advanced analytics

S Venkataraman, Z Yang, M Franklin, B Recht… - … USENIX Symposium on …, 2016 - usenix.org

Recent workload trends indicate rapid growth in the deployment of machine learning,
genomics and scientific workloads on cloud computing infrastructure. However, efficiently …

被引用次数：633 相关文章所有 13 个版本

[PDF] researchgate.net

Clash of the titans: Mapreduce vs. spark for large scale data analytics

J Shi, Y Qiu, UF Minhas, L Jiao, C Wang… - Proceedings of the …, 2015 - dl.acm.org

MapReduce and Spark are two very popular open source cluster computing frameworks for
large scale data analytics. These frameworks hide the complexity of task parallelism and …

被引用次数：317 相关文章所有 11 个版本

[PDF] arxiv.org

The many faces of data-centric workflow optimization: a survey

G Kougka, A Gounaris, A Simitsis - … Journal of Data Science and Analytics, 2018 - Springer

Workflow technology is rapidly evolving and, rather than being limited to modeling the
control flow in business processes, is becoming a key mechanism to perform advanced data …

被引用次数：52 相关文章所有 6 个版本

[PDF] acm.org

Black or white? how to develop an autotuner for memory-based analytics

M Kunjir, S Babu - Proceedings of the 2020 ACM SIGMOD International …, 2020 - dl.acm.org

There is a lot of interest today in building autonomous (or, self-driving) data processing
systems. An emerging school of thought is to leverage AI-driven" black box" algorithms for …

被引用次数：68 相关文章所有 5 个版本

[PDF] helsinki.fi

Speedup your analytics: Automatic parameter tuning for databases and big data systems

J Lu, Y Chen, H Herodotou, S Babu - Proceedings of the VLDB …, 2019 - dl.acm.org

Database and big data analytics systems such as Hadoop and Spark have a large number
of configuration parameters that control memory distribution, I/O optimization, parallelism …

被引用次数：69 相关文章所有 18 个版本

[PDF] vt.edu

Memtune: Dynamic memory management for in-memory data analytic platforms

L Xu, M Li, L Zhang, AR Butt, Y Wang… - 2016 IEEE international …, 2016 - ieeexplore.ieee.org

Memory is a crucial resource for big data processing frameworks such as Spark and M3R,
where the memory is used both for computation and for caching intermediate storage data …

被引用次数：96 相关文章所有 4 个版本

[PDF] upc.edu

Dynamic configuration of partitioning in spark applications

A Gounaris, G Kougka, R Tous… - IEEE Transactions on …, 2017 - ieeexplore.ieee.org

Spark has become one of the main options for large-scale analytics running on top of shared-
nothing clusters. This work aims to make a deep dive into the parallelism configuration and …

被引用次数：81 相关文章所有 10 个版本

[PDF] researchgate.net

Resource elasticity for large-scale machine learning

B Huang, M Boehm, Y Tian, B Reinwald… - Proceedings of the …, 2015 - dl.acm.org

Declarative large-scale machine learning (ML) aims at flexible specification of ML algorithms
and automatic generation of hybrid runtime plans ranging from single node, in-memory …

被引用次数：76 相关文章所有 5 个版本

[PDF] arxiv.org

Learning-based automatic parameter tuning for big data analytics frameworks

L Bao, X Liu, W Chen - … Conference on Big Data (Big Data), 2018 - ieeexplore.ieee.org

Big data analytics frameworks (BDAFs) have been widely used for data processing
applications. These frameworks provide a large number of configuration parameters to …

被引用次数：43 相关文章所有 4 个版本

高级搜索

QQ 群