A comprehensive study of the past, present, and future of data deduplication

W Xia, H Jiang, D Feng, F Douglis… - Proceedings of the …, 2016 - ieeexplore.ieee.org
Data deduplication, an efficient approach to data reduction, has gained increasing attention
and popularity in large-scale storage systems due to the explosive growth of digital data. It …

Design tradeoffs for data deduplication performance in backup workloads

M Fu, D Feng, Y Hua, X He, Z Chen, W Xia… - … USENIX Conference on …, 2015 - usenix.org
Data deduplication has become a standard component in modern backup systems. In order
to understand the fundamental tradeoffs in each of its design choices (such as prefetching …

{DupHunter}: Flexible {High-Performance} Deduplication for Docker Registries

N Zhao, H Albahar, S Abraham, K Chen… - 2020 USENIX Annual …, 2020 - usenix.org
Containers are increasingly used in a broad spectrum of applications from cloud services to
storage to supporting emerging edge computing paradigm. This has led to an explosive …

Hpdedup: A hybrid prioritized data deduplication mechanism for primary storage in the cloud

H Wu, C Wang, Y Fu, S Sakr, L Zhu, K Lu - arXiv preprint arXiv:1702.08153, 2017 - arxiv.org
Eliminating duplicate data in primary storage of clouds increases the cost-efficiency of cloud
service providers as well as reduces the cost of users for using cloud services. Existing …

Finesse:{Fine-Grained} Feature Locality based Fast Resemblance Detection for {Post-Deduplication} Delta Compression

Y Zhang, W Xia, D Feng, H Jiang, Y Hua… - 17th USENIX Conference …, 2019 - usenix.org
In storage systems, delta compression is often used as a complementary data reduction
technique for data deduplication because it is able to eliminate redundancy among the non …

{ALACC}: Accelerating Restore Performance of Data Deduplication Systems Using Adaptive {Look-Ahead} Window Assisted Chunk Caching

Z Cao, H Wen, F Wu, DHC Du - 16th USENIX Conference on File and …, 2018 - usenix.org
Data deduplication has been widely applied in storage systems to improve the efficiency of
space utilization. In data deduplication systems, the data restore performance is seriously …

CIDR: A cost-effective in-line data reduction system for terabit-per-second scale SSD arrays

M Ajdari, P Park, J Kim, D Kwon… - 2019 IEEE International …, 2019 - ieeexplore.ieee.org
An SSD array, a storage system consisting of multiple SSDs per node, has become a design
choice to implement a fast primary storage system, and modern storage architects now aim …

Sliding {Look-Back} Window Assisted Data Chunk Rewriting for Improving Deduplication Restore Performance

Z Cao, S Liu, F Wu, G Wang, B Li, DHC Du - 17th USENIX Conference …, 2019 - usenix.org
Data deduplication is an effective way of improving storage space utilization. The data
generated by deduplication is persistently stored in data chunks or data containers (a …

{Light-Dedup}: A Light-weight Inline Deduplication Framework for {Non-Volatile} Memory File Systems

J Qiu, Y Pan, W Xia, X Huang, W Wu, X Zou… - 2023 USENIX Annual …, 2023 - usenix.org
Emerging NVM is promising to become the next-generation storage media. However, its
high cost hinders its development. Recent deduplication researches in NVM file systems …

{OrderMergeDedup}: Efficient,{Failure-Consistent} Deduplication on Flash

Z Chen, K Shen - 14th USENIX Conference on File and Storage …, 2016 - usenix.org
Flash storage is commonplace on mobile devices, sensors, and cloud servers. I/O
deduplication is beneficial for saving the storage space and reducing expensive Flash …