Data preparation: A technological perspective and review

AAA Fernandes, M Koehler, N Konstantinou… - SN Computer …, 2023 - Springer
Data analysis often uses data sets that were collected for different purposes. Indeed, new
insights are often obtained by combining data sets that were produced independently of …

Raha: A configuration-free error detection system

M Mahdavi, Z Abedjan, R Castro Fernandez… - Proceedings of the …, 2019 - dl.acm.org
Detecting erroneous values is a key step in data cleaning. Error detection algorithms usually
require a user to provide input configurations in the form of rules or statistical parameters …

RPT: relational pre-trained transformer is almost all you need towards democratizing data preparation

N Tang, J Fan, F Li, J Tu, X Du, G Li, S Madden… - arXiv preprint arXiv …, 2020 - arxiv.org
Can AI help automate human-easy but computer-hard data preparation tasks that burden
data scientists, practitioners, and crowd workers? We answer this question by presenting …

Assessing and remedying coverage for a given dataset

A Asudeh, Z Jin, HV Jagadish - 2019 IEEE 35th International …, 2019 - ieeexplore.ieee.org
Data analysis impacts virtually every aspect of our society today. Often, this analysis is
performed on an existing dataset, possibly collected through a process that the data …

[HTML][HTML] Construction of Knowledge Graphs: Current State and Challenges

M Hofer, D Obraczka, A Saeedi, H Köpcke, E Rahm - Information, 2024 - mdpi.com
With Knowledge Graphs (KGs) at the center of numerous applications such as recommender
systems and question-answering, the need for generalized pipelines to construct and …

Construction of knowledge graphs: State and challenges

M Hofer, D Obraczka, A Saeedi, H Köpcke… - arXiv preprint arXiv …, 2023 - arxiv.org
With knowledge graphs (KGs) at the center of numerous applications such as recommender
systems and question answering, the need for generalized pipelines to construct and …

VerifAI: verified generative AI

N Tang, C Yang, J Fan, L Cao, Y Luo… - arXiv preprint arXiv …, 2023 - arxiv.org
Generative AI has made significant strides, yet concerns about the accuracy and reliability of
its outputs continue to grow. Such inaccuracies can have serious consequences such as …

Predicting achievement of students in smart campus

S Qu, K Li, S Zhang, Y Wang - IEEE access, 2018 - ieeexplore.ieee.org
Isolate data among different campus information systems and not much effective information
among the big data generated by these systems cause that it is a challenge for predicting …

Cleaning data with llunatic

F Geerts, G Mecca, P Papotti, D Santoro - The VLDB Journal, 2020 - Springer
Data cleaning (or data repairing) is considered a crucial problem in many database-related
tasks. It consists in making a database consistent with respect to a given set of constraints. In …

Alphaclean: Automatic generation of data cleaning pipelines

S Krishnan, E Wu - arXiv preprint arXiv:1904.11827, 2019 - arxiv.org
The analyst effort in data cleaning is gradually shifting away from the design of hand-written
scripts to building and tuning complex pipelines of automated data cleaning libraries. Hyper …