A survey and classification of storage deduplication systems

J Paulo, J Pereira - ACM Computing Surveys (CSUR), 2014 - dl.acm.org
The automatic elimination of duplicate data in a storage system, commonly known as
deduplication, is increasingly accepted as an effective technique to reduce storage costs …

Data deduplication techniques for efficient cloud storage management: a systematic review

R Kaur, I Chana, J Bhattacharya - The Journal of Supercomputing, 2018 - Springer
The exponential growth of digital data in cloud storage systems is a critical issue presently
as a large amount of duplicate data in the storage systems exerts an extra load on it …

A comprehensive study of the past, present, and future of data deduplication

W Xia, H Jiang, D Feng, F Douglis… - Proceedings of the …, 2016 - ieeexplore.ieee.org
Data deduplication, an efficient approach to data reduction, has gained increasing attention
and popularity in large-scale storage systems due to the explosive growth of digital data. It …

A study of practical deduplication

DT Meyer, WJ Bolosky - ACM Transactions on Storage (ToS), 2012 - dl.acm.org
We collected file system content data from 857 desktop computers at Microsoft over a span
of 4 weeks. We analyzed the data to determine the relative efficacy of data deduplication …

BloomFlash: Bloom filter on flash-based storage

B Debnath, S Sengupta, J Li, DJ Lilja… - 2011 31st International …, 2011 - ieeexplore.ieee.org
The bloom filter is a probabilistic data structure that provides a compact representation of a
set of elements. To keep false positive probabilities low, the size of the bloom filter must be …

Extreme binning: Scalable, parallel deduplication for chunk-based file backup

D Bhagwat, K Eshghi, DDE Long… - … on Modeling, Analysis …, 2009 - ieeexplore.ieee.org
Data deduplication is an essential and critical component of backup systems. Essential,
because it reduces storage space requirements, and critical, because the performance of …

[PDF][PDF] {CAFTL}: A {Content-Aware} flash translation layer enhancing the lifespan of flash memory based solid state drives

F Chen, T Luo, X Zhang - 9th USENIX Conference on File and Storage …, 2011 - usenix.org
Abstract Although Flash Memory based Solid State Drive (SSD) exhibits high performance
and low power consumption, a critical concern is its limited lifespan along with the …

FlashStore: High throughput persistent key-value store

B Debnath, S Sengupta, J Li - Proceedings of the VLDB Endowment, 2010 - dl.acm.org
We present FlashStore, a high throughput persistent key-value store, that uses flash memory
as a non-volatile cache between RAM and hard disk. FlashStore is designed to store the …

[PDF][PDF] iDedup: latency-aware, inline data deduplication for primary storage.

K Srinivasan, T Bisson, GR Goodson, K Voruganti - Fast, 2012 - usenix.org
Deduplication technologies are increasingly being deployed to reduce cost and increase
space-efficiency in corporate data centers. However, prior research has not applied …

[PDF][PDF] Characteristics of backup workloads in production systems.

G Wallace, F Douglis, H Qian, P Shilane, S Smaldone… - FAST, 2012 - usenix.org
2009 EMC Template Sample 24 Point Arial Regular Page 1 © Copyright 2012 EMC
Corporation. All rights reserved. CHARACTERISTICS OF BACKUP WORKLOADS IN …