Performance variability has been acknowledged as a problem for over a decade by cloud practitioners and performance engineers. Yet, our survey of top systems conferences …
X Zeng, Y Hui, J Shen, A Pavlo, W McKinney… - arXiv preprint arXiv …, 2023 - arxiv.org
Columnar storage is a core component of a modern data analytics system. Although many database management systems (DBMSs) have proprietary storage formats, most provide …
In-network processing, where data is processed by special-purpose devices as it passes over the network, is showing great promise at improving application performance, in …
T Ivanov, M Pergolesi - Concurrency and Computation …, 2020 - Wiley Online Library
Columnar file formats provide an efficient way to store data to be queried by SQL‐on‐ Hadoop engines. Related works consider the performance of processing engine and file …
A Trivedi, L Wang, H Bal, A Iosup - 3rd USENIX Workshop on Hot Topics …, 2020 - usenix.org
Edge computing is an emerging computing paradigm where data is generated and processed in the field using distributed computing devices. Many applications such as real …
P Stuedi, A Trivedi, J Pfefferle, A Klimovic… - 2019 USENIX Annual …, 2019 - usenix.org
Efficiently exchanging temporary data between tasks is critical to the end-to-end performance of many data processing frameworks and applications. Unfortunately, the …
J Yun, B Tak, WS Han - Proceedings of the VLDB Endowment, 2024 - dl.acm.org
The schemalessness, one of the major advantages of JSON representation format, comes with high penalties in querying and operations by denying various critical functions such as …
Y Feng, Z Liu, Y Zhao, T Jin, Y Wu, Y Zhang… - 2021 USENIX Annual …, 2021 - usenix.org
The scale of computer clusters has grown significantly in recent years. Today, a cluster may have 100 thousand machines and execute billions of tasks, especially short tasks, each day …
With the ever-increasing dataset sizes, several file formats such as Parquet, ORC, and Avro have been developed to store data efficiently, save the network, and interconnect bandwidth …