Decomposed bounded floats for fast compression and queries

C Liu, H Jiang, J Paparrizos, AJ Elmore - Proceedings of the VLDB …, 2021 - dl.acm.org
Modern data-intensive applications often generate large amounts of low precision float data
with a limited range of values. Despite the prevalence of such data, there is a lack of an …

Limits of reproducibility and hydrodynamic noise in atmospheric regional modelling

B Geyer, T Ludwig, H von Storch - Communications Earth & …, 2021 - nature.com
Reproducibility of research results is a fundamental quality criterion in science; thus,
computer architecture effects on simulation results must be determined. Here, we investigate …

Pushing ML Predictions Into DBMSs

M Paganelli, P Sottovia, K Park… - … on Knowledge and …, 2023 - ieeexplore.ieee.org
In the past decade, many approaches have been suggested to execute ML workloads on a
DBMS. However, most of them have looked at in-DBMS ML from a training perspective …

Chasing similarity: Distribution-aware aggregation scheduling

F Liu, A Salmasi, S Blanas, A Sidiropoulos - Proceedings of the VLDB …, 2018 - dl.acm.org
Parallel aggregation is a ubiquitous operation in data analytics that is expressed as GROUP
BY in SQL, reduce in Hadoop, or segment in TensorFlow. Parallel aggregation starts with an …

SimFS: a simulation data virtualizing file system interface

S Di Girolamo, P Schmid… - 2019 IEEE …, 2019 - ieeexplore.ieee.org
Nowadays simulations can produce petabytes of data to be stored in parallel filesystems or
large-scale databases. This data is accessed over the course of decades often by thousands …

Asynchronous Multi-Level Checkpointing: An Enabler of Reproducibility using Checkpoint History Analytics

K Assogba, B Nicolae, H Van Dam… - … of the SC'23 Workshops of …, 2023 - dl.acm.org
High-performance computing applications are increasingly integrating checkpointing
libraries for reproducibility analytics. However, capturing an entire checkpoint history for …

Fast and Effective Compression for IoT Systems

C Liu - 2022 - search.proquest.com
Abstract The Internet of Things (IoT) enables connections of trillions of sensors and data
collection for connectivity and analytics. The amount of IoT-generated data has exploded …

The effect of Computational Environments on Big Data Processing Pipelines in Neuroimaging

MA Salari - 2021 - spectrum.library.concordia.ca
Variations in computational infrastructures, including operating systems, software versions,
and hardware architectures, introduce variability in neuroimaging analyses that could affect …

Chasing Similarity: Distribution-aware Aggregation Scheduling (Extended Version)

F Liu, A Salmasi, S Blanas, A Sidiropoulos - arXiv preprint arXiv …, 2018 - arxiv.org
Parallel aggregation is a ubiquitous operation in data analytics that is expressed as GROUP
BY in SQL, reduce in Hadoop, or segment in TensorFlow. Parallel aggregation starts with an …

[PDF][PDF] Application-driven network and storage optimizations

S Di Girolamo - 2021 - research-collection.ethz.ch
During the last few decades, we transitioned into the data-driven era, where scientific
models are being computed on supercomputers and large datacenters. The …