Reproducible experiments on three-dimensional entity resolution with jedai

On tuning parameters guiding similarity computations in a data deduplication pipeline for customers records: Experience from a R&D project

W Andrzejewski, B Bębel, P Boiński, R Wrembel - Information Systems, 2024 - Elsevier

Data stored in information systems are often erroneous. Duplicate data are one of the typical
error type. To discover and handle duplicates, the so-called deduplication methods are …

被引用次数：4 相关文章

[PDF] arxiv.org

Unsupervised matching of data and text

N Ahmadi, H Sand, P Papotti - 2022 IEEE 38th International …, 2022 - ieeexplore.ieee.org

Entity resolution is a widely studied problem with several proposals to match records across
relations. Matching textual content is a widespread task in many applications, such as …

被引用次数：25 相关文章所有 8 个版本

Data integration, cleaning, and deduplication: Research versus industrial projects

R Wrembel - … Conference on Information Integration and Web, 2022 - Springer

In business applications, data integration is typically implemented as a data warehouse
architecture. In this architecture, heterogeneous and distributed data sources are accessed …

被引用次数：7 相关文章所有 2 个版本

[PDF] arxiv.org

Deep clustering for data cleaning and integration

HT Rauf, A Freitas, NW Paton - arXiv preprint arXiv:2305.13494, 2023 - arxiv.org

Deep Learning (DL) techniques now constitute the state-of-the-art for important problems in
areas such as text and image processing, and there have been impactful results that deploy …

被引用次数：3 相关文章所有 3 个版本

pyJedAI: A Library with Resolution-Related Structures and Procedures for Products

E Ioannou, K Nikoletos… - INFORMS Journal on …, 2024 - pubsonline.informs.org

This work presents an open-source Python library, named pyJedAI, which provides
functionalities supporting the creation of algorithms related to product entity resolution …

TableDC: Deep Clustering for Tabular Data

HT Rauf, A Freitas, NW Paton - arXiv preprint arXiv:2405.17723, 2024 - arxiv.org

Deep clustering (DC), a fusion of deep representation learning and clustering, has recently
demonstrated positive results in data science, particularly text processing and computer …

被引用次数：1 相关文章

On tuning parameters guiding similarity computations in a data deduplication pipeline for customers records

W Andrzejewski, B Bębel, P Boiński, R Wrembel - 2024 - dl.acm.org

Data stored in information systems are often erroneous. Duplicate data are one of the typical
error type. To discover and handle duplicates, the so-called deduplication methods are …

高级搜索

QQ 群