A survey of deep active learning

P Ren, Y Xiao, X Chang, PY Huang, Z Li… - ACM computing …, 2021 - dl.acm.org
Active learning (AL) attempts to maximize a model's performance gain while annotating the
fewest samples possible. Deep learning (DL) is greedy for data and requires a large amount …

Data-driven materials research enabled by natural language processing and information extraction

EA Olivetti, JM Cole, E Kim, O Kononova… - Applied Physics …, 2020 - pubs.aip.org
Given the emergence of data science and machine learning throughout all aspects of
society, but particularly in the scientific domain, there is increased importance placed on …

Can foundation models wrangle your data?

A Narayan, I Chami, L Orr, S Arora, C Ré - arXiv preprint arXiv:2205.09911, 2022 - arxiv.org
Foundation Models (FMs) are models trained on large corpora of data that, at very large
scale, can generalize to new tasks without any task-specific finetuning. As these models …

Deep entity matching with pre-trained language models

Y Li, J Li, Y Suhara, AH Doan, WC Tan - arXiv preprint arXiv:2004.00584, 2020 - arxiv.org
We present Ditto, a novel entity matching system based on pre-trained Transformer-based
language models. We fine-tune and cast EM as a sequence-pair classification problem to …

A comprehensive survey on automatic knowledge graph construction

L Zhong, J Wu, Q Li, H Peng, X Wu - ACM Computing Surveys, 2023 - dl.acm.org
Automatic knowledge graph construction aims at manufacturing structured human
knowledge. To this end, much effort has historically been spent extracting informative fact …

Neo: A learned query optimizer

R Marcus, P Negi, H Mao, C Zhang, M Alizadeh… - arXiv preprint arXiv …, 2019 - arxiv.org
Query optimization is one of the most challenging problems in database systems. Despite
the progress made over the past decades, query optimizers remain extremely complex …

An overview of end-to-end entity resolution for big data

V Christophides, V Efthymiou, T Palpanas… - ACM Computing …, 2020 - dl.acm.org
One of the most critical tasks for improving data quality and increasing the reliability of data
analytics is Entity Resolution (ER), which aims to identify different descriptions that refer to …

A benchmarking study of embedding-based entity alignment for knowledge graphs

Z Sun, Q Zhang, W Hu, C Wang, M Chen… - arXiv preprint arXiv …, 2020 - arxiv.org
Entity alignment seeks to find entities in different knowledge graphs (KGs) that refer to the
same real-world object. Recent advancement in KG embedding impels the advent of …

[图书][B] Data cleaning

IF Ilyas, X Chu - 2019 - books.google.com
This is an overview of the end-to-end data cleaning process. Data quality is one of the most
important problems in data management, since dirty data often leads to inaccurate data …

DeepER--Deep Entity Resolution

M Ebraheem, S Thirumuruganathan, S Joty… - arXiv preprint arXiv …, 2017 - arxiv.org
Entity resolution (ER) is a key data integration problem. Despite the efforts in 70+ years in all
aspects of ER, there is still a high demand for democratizing ER-humans are heavily …