An effective and cost-based framework for a qualitative hybrid data deduplication- 学术资源搜索

An effective and cost-based framework for a qualitative hybrid data deduplication

CR Haruna, MS Hou, MJ Eghan… - Advances in Computer …, 2019 - Springer

CR Haruna, MS Hou, MJ Eghan, MY Kpiebaareh, L Tandoh

Advances in Computer Communication and Computational Sciences: Proceedings of …, 2019•Springer

Abstract

In real world, entities may occur several times in a database. These duplicates may have varying keys and/or include errors that make deduplication a difficult task. Deduplication cannot be solved accurately using either machine-based or crowdsourcing techniques only. Crowdsourcing were used to resolve the shortcomings of machine-based approaches. Compared to machines, the crowd provided relatively accurate results, but with a slow execution time and very expensive too. A hybrid technique for data deduplication using a Euclidean distance and a chromatic correlation clustering algorithm was presented. The technique aimed at: reducing the crowdsourcing cost, reducing the time the crowd use in deduplication and finally providing higher accuracy in data deduplication. In the experiments, the proposed algorithm was compared with some existing techniques and outperformed some, offering an utmost deduplication accuracy efficiency and also incurring low crowdsourcing cost.

Springer

展开收起

被引用次数：4 相关文章所有 2 个版本

以上显示的是最相近的搜索结果。查看全部搜索结果

高级搜索

QQ 群

An effective and cost-based framework for a qualitative hybrid data deduplication

引用