{ZNS}: Avoiding the block interface tax for flash-based {SSDs}

M Bjørling, A Aghayev, H Holmberg… - 2021 USENIX Annual …, 2021 - usenix.org
The Zoned Namespace (ZNS) interface represents a new division of functionality between
host software and flash-based SSDs. Current flash-based SSDs maintain the decades-old …

{LightNVM}: The linux {Open-Channel}{SSD} subsystem

M Bjørling, J Gonzalez, P Bonnet - 15th USENIX Conference on File and …, 2017 - usenix.org
As Solid-State Drives (SSDs) become commonplace in data-centers and storage arrays,
there is a growing demand for predictable latency. Traditional SSDs, serving block I/Os, fail …

Why does the cloud stop computing? lessons from hundreds of service outages

HS Gunawi, M Hao, RO Suminto, A Laksono… - Proceedings of the …, 2016 - dl.acm.org
We conducted a cloud outage study (COS) of 32 popular Internet services. We analyzed
1247 headline news and public post-mortem reports that detail 597 unplanned outages that …

Tiny-tail flash: Near-perfect elimination of garbage collection tail latencies in NAND SSDs

S Yan, H Li, M Hao, MH Tong… - ACM Transactions on …, 2017 - dl.acm.org
Flash storage has become the mainstream destination for storage users. However, SSDs do
not always deliver the performance that users expect. The core culprit of flash performance …

Making disk failure predictions {SMARTer}!

S Lu, B Luo, T Patel, Y Yao, D Tiwari… - 18th USENIX Conference …, 2020 - usenix.org
Disk drives are one of the most commonly replaced hardware components and continue to
pose challenges for accurate failure prediction. In this work, we present analysis and …

File systems unfit as distributed storage backends: lessons from 10 years of Ceph evolution

A Aghayev, S Weil, M Kuchnik, M Nelson… - Proceedings of the 27th …, 2019 - dl.acm.org
For a decade, the Ceph distributed file system followed the conventional wisdom of building
its storage backend on top of local file systems. This is a preferred choice for most distributed …

Fail-slow at scale: Evidence of hardware performance faults in large production systems

HS Gunawi, RO Suminto, R Sears, C Golliher… - ACM Transactions on …, 2018 - dl.acm.org
Fail-slow hardware is an under-studied failure mode. We present a study of 114 reports of
fail-slow hardware incidents, collected from large-scale cluster deployments in 14 …

The {CASE} of {FEMU}: Cheap, accurate, scalable and extensible flash emulator

H Li, M Hao, MH Tong, S Sundararaman… - … USENIX Conference on …, 2018 - usenix.org
We present FEMU, a QEMU-based flash emulator for fostering future full-stack
software/hardware SSD research, with the following four" CASE" benefits. FEMU is cheap …

TaxDC: A taxonomy of non-deterministic concurrency bugs in datacenter distributed systems

T Leesatapornwongsa, JF Lukman, S Lu… - Proceedings of the …, 2016 - dl.acm.org
We present TaxDC, the largest and most comprehensive taxonomy of non-deterministic
concurrency bugs in distributed systems. We study 104 distributed concurrency (DC) bugs …

{LinnOS}: Predictability on unpredictable flash storage with a light neural network

M Hao, L Toksoz, N Li, EE Halim, H Hoffmann… - … USENIX Symposium on …, 2020 - usenix.org
This paper presents LinnOS, an operating system that leverages a light neural network for
inferring SSD performance at a very fine—per-IO—granularity and helps parallel storage …