Rearchitecting the {TCP} Stack for {I/O-Offloaded} Content Delivery

T Kim, DM Ng, J Gong, Y Kwon, M Yu… - 20th USENIX Symposium …, 2023 - usenix.org
The recent advancement of high-bandwidth I/O devices enables scalable delivery of online
content. Unfortunately, the traditional programming model for content servers has a tight …

Libnvmmio: Reconstructing Software {IO} Path with {Failure-Atomic}{Memory-Mapped} Interface

J Choi, J Hong, Y Kwon, H Han - 2020 USENIX Annual Technical …, 2020 - usenix.org
Fast non-volatile memory (NVM) technology changes the landscape of file systems. A series
of research efforts to overcome the traditional file system designs that limit NVM …

GPU-initiated on-demand high-throughput storage access in the BaM system architecture

Z Qureshi, VS Mailthody, I Gelado, S Min… - Proceedings of the 28th …, 2023 - dl.acm.org
Graphics Processing Units (GPUs) have traditionally relied on the host CPU to initiate
access to the data storage. This approach is well-suited for GPU applications with known …

Efficient Memory Mapped File {I/O} for {In-Memory} File Systems

J Choi, J Kim, H Han - 9th USENIX Workshop on Hot Topics in Storage …, 2017 - usenix.org
Recently, with the emergence of low-latency NVM storage, software overhead has become a
greater bottleneck than storage latency, and memory mapped file I/O has gained attention as …

{FVM}:{FPGA-assisted} Virtual Device Emulation for Fast, Scalable, and Flexible Storage Virtualization

D Kwon, J Boo, D Kim, J Kim - 14th USENIX Symposium on Operating …, 2020 - usenix.org
Emerging big-data workloads with massive I/O processing require fast, scalable, and flexible
storage virtualization support. Hardware-assisted virtualization can achieve reasonable …

Data motion acceleration: Chaining cross-domain multi accelerators

ST Wang, H Xu, A Mamandipoor… - … Symposium on High …, 2024 - ieeexplore.ieee.org
There has been an arms race for devising accelerators for deep learning in recent years.
However, real-world applications are not only neural networks but often span across …

BM-Store: A Transparent and High-performance Local Storage Architecture for Bare-metal Clouds Enabling Large-scale Deployment

Y Chen, J Xu, C Wei, Y Wang, X Yuan… - … Symposium on High …, 2023 - ieeexplore.ieee.org
Bare-metal instances are crucial for high-value, mission-critical applications on the cloud.
Tenants exclusively use these dedicated hardware resources. Local virtualized disks are …

[PDF][PDF] BaM: A case for enabling fine-grain high throughput GPU-orchestrated access to storage

Z Qureshi, VS Mailthody, I Gelado, SW Min… - arXiv preprint arXiv …, 2022 - academia.edu
Accelerators like Graphics Processing Units (GPUs) have been increasingly deployed in
modern data centers because of their compute capabilities and memory bandwidth. These …

Flashgpu: Placing new flash next to gpu cores

J Zhang, M Kwon, H Kim, H Kim, M Jung - Proceedings of the 56th …, 2019 - dl.acm.org
We propose FlashGPU, a new GPU architecture that tightly blends new flash (Z-NAND) with
massive GPU cores. Specifically, we replace global memory with Z-NAND that exhibits ultra …

Solros a data-centric operating system architecture for heterogeneous computing

C Min, W Kang, M Kumar, S Kashyap, S Maass… - Proceedings of the …, 2018 - dl.acm.org
We propose Solros---a new operating system architecture for heterogeneous systems that
comprises fast host processors, slow but massively parallel co-processors, and fast I/O …