(Almost) all of entity resolution

O Binette, RC Steorts - Science Advances, 2022 - science.org
Whether the goal is to estimate the number of people that live in a congressional district, to
estimate the number of individuals that have died in an armed conflict, or to disambiguate …

RPT: relational pre-trained transformer is almost all you need towards democratizing data preparation

N Tang, J Fan, F Li, J Tu, X Du, G Li, S Madden… - arXiv preprint arXiv …, 2020 - arxiv.org
Can AI help automate human-easy but computer-hard data preparation tasks that burden
data scientists, practitioners, and crowd workers? We answer this question by presenting …

Saga: A platform for continuous construction and serving of knowledge at scale

IF Ilyas, T Rekatsinas, V Konda, J Pound, X Qi… - Proceedings of the …, 2022 - dl.acm.org
We introduce Saga, a next-generation knowledge construction and serving platform for
powering knowledge-based applications at industrial scale. Saga follows a hybrid batch …

Jellyfish: Instruction-tuning local large language models for data preprocessing

H Zhang, Y Dong, C Xiao… - Proceedings of the 2024 …, 2024 - aclanthology.org
This paper explores the utilization of LLMs for data preprocessing (DP), a crucial step in the
data mining pipeline that transforms raw data into a clean format. We instruction-tune local …

Fact Ranking over Large-Scale Knowledge Graphs with Reasoning Embedding Models.

H Ren, A Mousavi, A Pacaci, SR Chowdhury… - IEEE Data Eng …, 2023 - sites.computer.org
Abstract Knowledge graphs (KGs) serve as the backbone of many applications such as
recommendation systems and question answering. All these applications require reasoning …

MetaHive: A Cache-Optimized Metadata Management for Heterogeneous Key-Value Stores

A Heidari, A Ahmadi, Z Zhi, W Zhang - arXiv preprint arXiv:2407.19090, 2024 - arxiv.org
Cloud key-value (KV) stores provide businesses with a cost-effective and adaptive
alternative to traditional on-premise data management solutions. KV stores frequently …

Uncertainty Management in the Construction of Knowledge Graphs: a Survey

L Jarnac, Y Chabot, M Couceiro - arXiv preprint arXiv:2405.16929, 2024 - arxiv.org
Knowledge Graphs (KGs) are a major asset for companies thanks to their great flexibility in
data representation and their numerous applications, eg, vocabulary sharing, Q/A or …

Frost: a platform for benchmarking and exploring data matching results

M Graf, L Laskowski, F Papsdorf, F Sold… - arXiv preprint arXiv …, 2021 - arxiv.org
" Bad" data has a direct impact on 88% of companies, with the average company losing 12%
of its revenue due to it. Duplicates-multiple but different representations of the same real …

UpLIF: An Updatable Self-Tuning Learned Index Framework

A Heidari, A Ahmadi, W Zhang - arXiv preprint arXiv:2408.04113, 2024 - arxiv.org
The emergence of learned indexes has caused a paradigm shift in our perception of
indexing by considering indexes as predictive models that estimate keys' positions within a …

Uncertainty Management in the Construction of Knowledge Graphs: a Survey

M Couceiro, L Jarnac, Y Chabot - 2024 - inria.hal.science
Knowledge Graphs (KGs) are a major asset for companies thanks to their great flexibility in
data representation and their numerous applications, eg, vocabulary sharing, Q/A or …