A software reference architecture for semantic-aware Big Data systems

S Nadal, V Herrero, O Romero, A Abelló… - Information and software …, 2017 - Elsevier
Abstract Context: Big Data systems are a class of software systems that ingest, store,
process and serve massive amounts of heterogeneous data, from multiple sources. Despite …

Albis:{High-Performance} File Format for Big Data Systems

A Trivedi, P Stuedi, J Pfefferle, A Schuepbach… - 2018 USENIX Annual …, 2018 - usenix.org
Over the last decade, a variety of external file formats such as Parquet, ORC, Arrow, etc.,
have been developed to store large volumes of relational data in the cloud. As high …

Interactive data exploration of distributed raw files: A systematic mapping study

A Alvarez-Ayllon, M Palomo-Duarte, JM Dodero - IEEE Access, 2018 - ieeexplore.ieee.org
When exploring big amounts of data without a clear target, providing an interactive
experience becomes really difficult, since this tentative inspection usually defeats any early …

A cost-based storage format selector for materialized results in big data frameworks

RF Munir, A Abelló, O Romero, M Thiele… - Distributed and Parallel …, 2020 - Springer
Modern big data frameworks (such as Hadoop and Spark) allow multiple users to do large-
scale analysis simultaneously, by deploying data-intensive workflows (DIWs). These DIWs of …

Intermediate results materialization selection and format for data-intensive flows

RF Munir, S Nadal, O Romero, A Abelló… - Fundamenta …, 2018 - content.iospress.com
Data-intensive flows deploy a variety of complex data transformations to build information
pipelines from data sources to different end users. As data are processed, these workflows …

ATUN-HL: Auto tuning of hybrid layouts using workload and data characteristics

RF Munir, A Abelló, O Romero, M Thiele… - Advances in Databases …, 2018 - Springer
Ad-hoc analysis implies processing data in near real-time. Thus, raw data (ie, neither
normalized nor transformed) is typically dumped into a distributed engine, where it is …

A cost-based storage format selector for materialization in big data frameworks

RF Munir, A Abelló, O Romero, M Thiele… - arXiv preprint arXiv …, 2018 - arxiv.org
Modern big data frameworks (such as Hadoop and Spark) allow multiple users to do large-
scale analysis simultaneously. Typically, users deploy Data-Intensive Workflows (DIWs) for …

[PDF][PDF] Dieses Dokument ist eine Zweitveröffentlichung (Postprint)/This is a self-archiving document (accepted version)

M Rudolf, H Voigt, C Bornhövd, W Lehner - core.ac.uk
The past few years have seen a tremendous increase in often irregularly structured data that
can be represented most naturally and efficiently in the form of graphs. Making sense of …

Storage format selection and optimization for materialized intermediate results in data-intensive flows

RF Munir - 2019 - upcommons.upc.edu
Modern organizations produce and collect large volumes of data, that need to be processed
repeatedly and quickly for gaining business insights. For such processing, typically, Data …

[PDF][PDF] On-demand Data Integration

SN Francesch - cs.upc.edu
Data has an undoubtedly impact on society. Storing and processing large amounts of
available data is currently one of the key success factors for an organization. In order to carry …