Finesse:{Fine-Grained} Feature Locality based Fast Resemblance Detection for {Post-Deduplication} Delta Compression

Y Zhang, W Xia, D Feng, H Jiang, Y Hua… - 17th USENIX Conference …, 2019 - usenix.org
In storage systems, delta compression is often used as a complementary data reduction
technique for data deduplication because it is able to eliminate redundancy among the non …

Sliding {Look-Back} Window Assisted Data Chunk Rewriting for Improving Deduplication Restore Performance

Z Cao, S Liu, F Wu, G Wang, B Li, DHC Du - 17th USENIX Conference …, 2019 - usenix.org
Data deduplication is an effective way of improving storage space utilization. The data
generated by deduplication is persistently stored in data chunks or data containers (a …

Lipa: A learning-based indexing and prefetching approach for data deduplication

G Xu, B Tang, H Lu, Q Yu… - 2019 35th Symposium on …, 2019 - ieeexplore.ieee.org
In this paper, we present a learning based data deduplication algorithm, called LIPA, which
uses the reinforcement learning framework to build an adaptive indexing structure. It is …

TDDFS: A tier-aware data deduplication-based file system

Z Cao, H Wen, X Ge, J Ma, J Diehl… - ACM Transactions on …, 2019 - dl.acm.org
With the rapid increase in the amount of data produced and the development of new types of
storage devices, storage tiering continues to be a popular way to achieve a good tradeoff …

Prefetch-aware fingerprint cache management for data deduplication systems

M Li, H Zhang, Y Wu, C Zhao - Frontiers of Computer Science, 2019 - Springer
Data deduplication has been widely utilized in large-scale storage systems, particularly
backup systems. Data deduplication systems typically divide data streams into chunks and …

CSF: An efficient parallel deduplication algorithm by clustering scattered fingerprints

H Fan, G Xu, Y Zhang, L Yuan… - 2019 IEEE Intl Conf on …, 2019 - ieeexplore.ieee.org
Deduplication is one of the most effective and efficient techniques to save memory space. It
is widely used in data centers and cloud storage systems. Multi-stream concurrency is …

The power of better choice: Reducing relocations in cuckoo filter

F Wang, H Chen, L Liao, F Zhang… - 2019 IEEE 39th …, 2019 - ieeexplore.ieee.org
Efficient set representation and membership testing are important in various big data
applications. The state-of-the-art Cuckoo filter design shows great advantages in both query …

Z-Dedup: A case for deduplicating compressed contents in cloud

Z Yan, H Jiang, Y Tan, S Skelton… - 2019 IEEE International …, 2019 - ieeexplore.ieee.org
Lossless data reduction techniques, particularly compression and deduplication, have
emerged as effective approaches to tackling the combined challenge of explosive growth in …

Quotient hash tables: efficiently detecting duplicates in streaming data

R Géraud, M Lombard-Platet, D Naccache - Proceedings of the 34th …, 2019 - dl.acm.org
This article presents the Quotient Hash Table (QHT) a new data structure for duplicate
detection in unbounded streams. QHTs stem from a corrected analysis of streaming quotient …

Extracting Better Performance From The Parallelism Offered By SSDs

N Elyasi - 2019 - etda.libraries.psu.edu
The majority of growth in the industry is driven by massive data processing which in turn is
driving a tremendous need for high performance storage. To satisfy the lower latency …