Advanced data deduplication techniques and their application

D Meister - 2013 - openscience.ub.uni-mainz.de
Abstract: Data deduplication describes a class of … rnrnOne of the most important applications
of data deduplication are … so-called fingerprinting-based data deduplication. It starts with the …

Inde: An inline data deduplication approach via adaptive detection of valid container utilization

L Lin, Y Deng, Y Zhou, Y Zhu - ACM Transactions on Storage, 2023 - dl.acm.org
… Then, VCRC change between the current iteration and the previous iteration is calculated …
the current iteration is fewer than 1, or VCRC change meets Equation (1), then the iteration will …

Interactive deduplication using active learning

S Sarawagi, A Bhamidipaty - Proceedings of the eighth ACM SIGKDD …, 2002 - dl.acm.org
… In each iteration, the user aids the learner by providing new labeled data. A useful side effect
of the user inspecting the model's prediction at each iteration is that, he can discover newer …

I-sieve: An inline high performance deduplication system used in cloud storage

J Wang, Z Zhao, Z Xu, H Zhang, L Li… - Tsinghua Science and …, 2015 - ieeexplore.ieee.org
… Third, we can clearly see the changes of Windows systems and their applications iteration
versions from the variety of deduplication ratio as shown in Fig. 5b, which also points us a …

RecD: Deduplication for end-to-end deep learning recommendation model training infrastructure

M Zhao, D Choudhary, D Tyagi… - Proceedings of …, 2023 - proceedings.mlsys.org
… the amount of data over the network in each iteration directly improves training throughput. …
of EMB lookups we need to perform in each iteration and thus required memory bandwidth. …

Better duplicate detection for systematic reviewers: evaluation of Systematic Review Assistant-Deduplication Module

J Rathbone, M Carter, T Hoffmann, P Glasziou - Systematic reviews, 2015 - Springer
… First and second iteration The first iteration of the deduplication algorithm achieved 75.0% …
This increased the sensitivity of the second iteration to 95.7% with more duplicates detected…

Automation of duplicate record detection for systematic reviews: Deduplicator

C Forbes, H Greenwood, M Carter, J Clark - Systematic reviews, 2024 - Springer
… However, further research should be performed comparing more deduplication methods to
… , deduplication algorithms can be defined in configuration files, which specify each iteration, …

Reducing impact of data fragmentation caused by in-line deduplication

M Kaczmarczyk, M Barczynski, W Kilian… - Proceedings of the 5th …, 2012 - dl.acm.org
deduplication. In such approach a backup is written without any deduplication, and later the
dedup … The algorithm iterates in a loop over the backup stream being written deciding for …

De-duplication of aggregation authority files

P Manghi, M Mikulicic, C Atzori - International Journal of …, 2012 - inderscienceonline.com
… To facilitate record deduplication in the presence of such heterogeneity and dynamicity,
PACE user interfaces enable an iterative curation process, where data curators can: (i) configure …

ObjDedup: High-Throughput Object Storage Layer for Backup Systems With Block-Level Deduplication

A Jackowski, Ł Ślusarczyk, K Lichota… - … on Parallel and …, 2023 - ieeexplore.ieee.org
… The reason is that such a read can be done on demand later without affecting the critical
path and the tree iteration, and we wanted the algorithm never to read more leaves than …