A survey and classification of storage deduplication systems

J Paulo, J Pereira - ACM Computing Surveys (CSUR), 2014 - dl.acm.org
The automatic elimination of duplicate data in a storage system, commonly known as
deduplication, is increasingly accepted as an effective technique to reduce storage costs …

Big data reduction framework for value creation in sustainable enterprises

MH ur Rehman, V Chang, A Batool, TY Wah - International journal of …, 2016 - Elsevier
Value creation is a major sustainability factor for enterprises, in addition to profit
maximization and revenue generation. Modern enterprises collect big data from various …

A comprehensive study of the past, present, and future of data deduplication

W Xia, H Jiang, D Feng, F Douglis… - Proceedings of the …, 2016 - ieeexplore.ieee.org
Data deduplication, an efficient approach to data reduction, has gained increasing attention
and popularity in large-scale storage systems due to the explosive growth of digital data. It …

POCLib: A high-performance framework for enabling near orthogonal processing on compression

F Zhang, J Zhai, X Shen, O Mutlu… - IEEE transactions on …, 2021 - ieeexplore.ieee.org
Parallel technology boosts data processing in recent years, and parallel direct data
processing on hierarchically compressed documents exhibits great promise. The high …

[HTML][HTML] Big data reduction methods: a survey

MH ur Rehman, CS Liew, A Abbas… - Data Science and …, 2016 - Springer
Research on big data analytics is entering in the new phase called fast data where multiple
gigabytes of data arrive in the big data systems every second. Modern big data systems …

[PDF][PDF] iDedup: latency-aware, inline data deduplication for primary storage.

K Srinivasan, T Bisson, GR Goodson, K Voruganti - Fast, 2012 - usenix.org
Deduplication technologies are increasingly being deployed to reduce cost and increase
space-efficiency in corporate data centers. However, prior research has not applied …

Design tradeoffs for data deduplication performance in backup workloads

M Fu, D Feng, Y Hua, X He, Z Chen, W Xia… - … USENIX Conference on …, 2015 - usenix.org
Data deduplication has become a standard component in modern backup systems. In order
to understand the fundamental tradeoffs in each of its design choices (such as prefetching …

{FastCDC}: A fast and efficient {Content-Defined} chunking approach for data deduplication

W Xia, Y Zhou, H Jiang, D Feng, Y Hua, Y Hu… - 2016 USENIX Annual …, 2016 - usenix.org
Content-Defined Chunking (CDC) has been playing a key role in data deduplication
systems in the past 15 years or so due to its high redundancy detection abil-ity. However …

Wan-optimized replication of backup datasets using stream-informed delta compression

P Shilane, M Huang, G Wallace, W Hsu - ACM Transactions on Storage …, 2012 - dl.acm.org
Replicating data off site is critical for disaster recovery reasons, but the current approach of
transferring tapes is cumbersome and error prone. Replicating across a wide area network …

Accelerating restore and garbage collection in deduplication-based backup systems via exploiting historical information

M Fu, D Feng, Y Hua, X He, Z Chen, W Xia… - 2014 USENIX Annual …, 2014 - usenix.org
In deduplication-based backup systems, the chunks of each backup are physically scattered
after deduplication, which causes a challenging fragmentation problem. The fragmentation …