Big data semantics

P Ceravolo, A Azzini, M Angelini, T Catarci… - Journal on Data …, 2018 - Springer
Big Data technology has discarded traditional data modeling approaches as no longer
applicable to distributed data processing. It is, however, largely recognized that Big Data …

Entity reconciliation in big data sources: A systematic mapping study

JG Enríquez, FJ Domínguez-Mayo, MJ Escalona… - Expert Systems with …, 2017 - Elsevier
The entity reconciliation (ER) problem aroused much interest as a research topic in today's
Big Data era, full of big and open heterogeneous data sources. This problem poses when …

[HTML][HTML] Content sensitivity based access control framework for Hadoop

TKA Kumar, H Liu, JP Thomas, X Hou - Digital Communications and …, 2017 - Elsevier
Big data technologies have seen tremendous growth in recent years. They are widely used
in both industry and academia. In spite of such exponential growth, these technologies lack …

[PDF][PDF] A Review of Data Cleaning Methods for Web Information System.

J Wang, X Wang, Y Yang… - … , Materials & Continua, 2020 - pdfs.semanticscholar.org
Web information system (WIS) is frequently-used and indispensable in daily social life. WIS
provides information services in many scenarios, such as electronic commerce …

[PDF][PDF] A Qualitative Literature Review on Linkage Techniques for Data Integration.

F Kruse, AP Hassan, JP Awick, JM Gómez, T Bui - HICSS, 2020 - academia.edu
The data linkage techniques “entity linking” and “record linkage” get rising attention as they
enable the integration of multiple data sources for data, web, and text mining approaches …

Computational intelligence techniques for assessing data quality: towards knowledge-driven processing

N Afriliana, D Król, FL Gaol - International Conference on Computational …, 2021 - Springer
Since the right decision is made from the correct data, assessing data quality is an important
process in computational science when working in a data-driven environment. Appropriate …

Approximate quality assessment with sampling approaches

H Liu, Z Sang, S Karali - 2019 International Conference on …, 2019 - ieeexplore.ieee.org
Data is useful to the extent that it can be quickly analyzed to reveal valuable information.
With high-quality data, we can increase revenue, reduce cost, and reduce risk. On the other …

AUTO-CDD: automatic cleaning dirty data using machine learning techniques

MZH Jesmeen, A Hossen, J Hossen… - TELKOMNIKA …, 2019 - telkomnika.uad.ac.id
Cleaning the dirty data has become very critical significance for many years, especially in
medical sectors. This is the reason behind widening research in this sector. To initiate the …

A survey on data cleaning methods in cyberspace

J Wang, H Zhang, B Fang, X Wang… - 2017 IEEE Second …, 2017 - ieeexplore.ieee.org
Cyberspace offers users and information communication systems the opportunity to interact
with each other for business. Data, as the carrier of information, represents the processing …

SjClust: A Framework for Incorporating Clustering into Set Similarity Join Algorithms

LA Ribeiro, A Cuzzocrea, KAA Bezerra… - Transactions on Large …, 2018 - Springer
A critical task in data cleaning and integration is the identification of duplicate records
representing the same real-world entity. Similarity join is largely used in order to detect pairs …