AdBench: a complete benchmark for modern data pipelines

M Bhandarkar - … Evaluation and Benchmarking. Traditional-Big Data …, 2017 - Springer
Since the introduction of Apache YARN, which modularly separated resource management
and scheduling from the distributed programming frameworks, a multitude of YARN-native …

Cirrodata: Yet another SQL-on-hadoop data analytics engine with high performance

ZH Jin, H Shi, YX Hu, L Zha, X Lu - Journal of Computer Science and …, 2020 - Springer
This paper presents CirroData, a high-performance SQL-on-Hadoop system designed for
Big Data analytics workloads. As a home-grown enterprise-level online analytical …

ADABench-towards an industry standard benchmark for advanced analytics

T Rabl, C Brücke, P Härtling, S Stars… - … for the Era of Cloud (s) …, 2020 - Springer
The digital revolution, rapidly decreasing storage cost, and remarkable results achieved by
state of the art machine learning (ML) methods are driving widespread adoption of ML …

Run-time performance optimization of a BigData query language

Y Liu, P Dube, SC Gray - Proceedings of the 5th ACM/SPEC …, 2014 - dl.acm.org
JAQL is a query language for large-scale data that connects BigData analytics and
MapReduce framework together. Also an IBM product, JAQL's performance is critical for IBM …

Weld: A common runtime for high performance data analytics

S Palkar, JJ Thomas, A Shanbhag, D Narayanan… - 2017 - dspace.mit.edu
© 2017 Conference on Innovative Data Systems Research (CIDR). All rights reserved.
Modern analytics applications combine multiple functions from different libraries and …

A study of SQL-on-Hadoop systems

Y Chen, X Qin, H Bian, J Chen, Z Dong, X Du… - Big Data Benchmarks …, 2014 - Springer
Hadoop is now the de facto standard for storing and processing big data, not only for
unstructured data but also for some structured data. As a result, providing SQL analysis …

HaLoop: Efficient iterative data processing on large clusters

Y Bu, B Howe, M Balazinska, MD Ernst - Proceedings of the VLDB …, 2010 - dl.acm.org
The growing demand for large-scale data mining and data analysis applications has led
both industry and academia to design new types of highly scalable data-intensive computing …

Apache tez: A unifying framework for modeling and building data processing applications

B Saha, H Shah, S Seth, G Vijayaraghavan… - Proceedings of the …, 2015 - dl.acm.org
The broad success of Hadoop has led to a fast-evolving and diverse ecosystem of
application engines that are building upon the YARN resource management layer. The open …

Challenging SQL-on-hadoop performance with apache druid

J Correia, C Costa, MY Santos - International Conference on Business …, 2019 - Springer
Abstract In Big Data, SQL-on-Hadoop tools usually provide satisfactory performance for
processing vast amounts of data, although new emerging tools may be an alternative. This …

[PDF][PDF] Impala: A modern, open-source sql engine for hadoop

M Bittorf, T Bobrovytsky, C Erickson… - Proceedings of the …, 2015 - pages.cs.wisc.edu
Cloudera Impala is a modern, open-source MPP SQL engine architected from the ground up
for the Hadoop data processing environment. Impala provides low latency and high …