Creating embeddings of heterogeneous relational datasets for data integration tasks

R Cappuzzo, P Papotti… - Proceedings of the 2020 …, 2020 - dl.acm.org
Deep learning based techniques have been recently used with promising results for data
integration problems. Some methods directly use pre-trained embeddings that were trained …

[PDF][PDF] Data Curation with Deep Learning.

S Thirumuruganathan, N Tang, M Ouzzani, AH Doan - EDBT, 2020 - openproceedings.org
Data curation–the process of discovering, integrating, and cleaning data–is one of the
oldest, hardest, yet inevitable data management problems. Despite decades of efforts from …

Entity and relation matching consensus for entity alignment

J Yang, D Wang, W Zhou, W Qian, X Wang… - Proceedings of the 30th …, 2021 - dl.acm.org
Entity alignment aims to match synonymous entities across different knowledge graphs,
which is a fundamental task for knowledge integration. Recently, researchers have devoted …

Exploring Federated Learning for Data Integration: A Structured Literature Review

JP Awick, G Schumann… - … Conference on Big Data …, 2023 - ieeexplore.ieee.org
Data integration is utilized to integrate heterogeneous data from multiple sources,
representing a crucial step to improve information value in data analysis and mining …

[PDF][PDF] Embdi: generating embeddings for relational data integration

R Cappuzzo, P Papotti… - 29th Italian Symposium on …, 2021 - ceur-ws.org
Deep learning techniques have been used with promising results for data integration
problems. Some methods use pre-trained embeddings that were trained on a large corpus …

Local embeddings for relational data integration

R Cappuzzo, P Papotti… - arXiv preprint arXiv …, 2019 - arxiv.org
Deep learning based techniques have been recently used with promising results for data
integration problems. Some methods directly use pre-trained embeddings that were trained …

Discovering lexical similarity using articulatory feature-based phonetic edit distance

T Ahmed, M Suffian, MY Khan, A Bogliolo - IEEE Access, 2021 - ieeexplore.ieee.org
Lexical Similarity (LS) between two languages uncovers many interesting linguistic insights
such as phylogenetic relationship, mutual intelligibility, common etymology, and loan words …

Deep learning models for tabular data curation

R Cappuzzo - 2022 - theses.hal.science
Data retention is a pervasive and far-reaching topic, affecting everything from academia to
industry. Current solutions rely on manual work by domain users, but they are not adequate …

[图书][B] Semi-supervised data cleaning

MM Lahijani - 2020 - search.proquest.com
Data cleaning is one of the most important but time-consuming tasks for data scientists. The
data cleaning task consists of two major steps:(1) error detection and (2) error correction …

Artificial intelligence system employing multimodal learning for analyzing entity record relationships

X Chen, L Wang, A Dutta - US Patent 11,423,072, 2022 - Google Patents
Respective text feature sets and non-text feature sets are generated corresponding to
individual pairs of a plurality of record pairs. At least one text feature is based on whether a …