Holodetect: Few-shot learning for error detection

A Narayan, I Chami, L Orr, S Arora, C Ré - arXiv preprint arXiv:2205.09911, 2022 - arxiv.org

Foundation Models (FMs) are models trained on large corpora of data that, at very large
scale, can generalize to new tasks without any task-specific finetuning. As these models …

被引用次数：198 相关文章所有 5 个版本

[HTML] mdpi.com

[HTML][HTML] Construction of knowledge graphs: current state and challenges

M Hofer, D Obraczka, A Saeedi, H Köpcke, E Rahm - Information, 2024 - mdpi.com

With Knowledge Graphs (KGs) at the center of numerous applications such as recommender
systems and question-answering, the need for generalized pipelines to construct and …

被引用次数：15 相关文章

[PDF] arxiv.org

Neo: A learned query optimizer

R Marcus, P Negi, H Mao, C Zhang, M Alizadeh… - arXiv preprint arXiv …, 2019 - arxiv.org

Query optimization is one of the most challenging problems in database systems. Despite
the progress made over the past decades, query optimizers remain extremely complex …

被引用次数：473 相关文章所有 22 个版本

[PDF] ird.fr

Big graphs: challenges and opportunities

W Fan - Proceedings of the VLDB Endowment, 2022 - dl.acm.org

Big data is typically characterized with 4V's: Volume, Velocity, Variety and Veracity. When it
comes to big graphs, these challenges become even more staggering. Each and every of …

被引用次数：21 相关文章所有 3 个版本

[PDF] vldb.org

Baran: Effective error correction via a unified context representation and transfer learning

M Mahdavi, Z Abedjan - Proceedings of the VLDB Endowment, 2020 - dl.acm.org

Traditional error correction solutions leverage handmaid rules or master data to find the
correct values. Both are often amiss in real-world scenarios. Therefore, it is desirable to …

被引用次数：105 相关文章所有 4 个版本

[PDF] acm.org

Rotom: A meta-learned data augmentation framework for entity matching, data cleaning, text classification, and beyond

Z Miao, Y Li, X Wang - … of the 2021 International Conference on …, 2021 - dl.acm.org

Deep Learning revolutionizes almost all fields of computer science including data
management. However, the demand for high-quality training data is slowing down deep …

被引用次数：69 相关文章所有 3 个版本

[PDF] github.io

Auto-suggest: Learning-to-recommend data preparation steps using data science notebooks

C Yan, Y He - Proceedings of the 2020 ACM SIGMOD International …, 2020 - dl.acm.org

Data preparation is widely recognized as the most time-consuming process in modern
business intelligence (BI) and machine learning (ML) projects. Automating complex data …

被引用次数：96 相关文章所有 4 个版本

Machine learning and data cleaning: Which serves the other?

IF Ilyas, T Rekatsinas - ACM Journal of Data and Information Quality …, 2022 - dl.acm.org

The last few years witnessed significant advances in building automated or semi-automated
data quality, data cleaning and data integration systems powered by machine learning (ML) …

被引用次数：56 相关文章

[PDF] arxiv.org

VerifAI: verified generative AI

N Tang, C Yang, J Fan, L Cao, Y Luo… - arXiv preprint arXiv …, 2023 - arxiv.org

Generative AI has made significant strides, yet concerns about the accuracy and reliability of
its outputs continue to grow. Such inaccuracies can have serious consequences such as …

被引用次数：22 相关文章所有 4 个版本

[PDF] arxiv.org

Cleanml: A study for evaluating the impact of data cleaning on ml classification tasks

P Li, X Rao, J Blase, Y Zhang, X Chu… - 2021 IEEE 37th …, 2021 - ieeexplore.ieee.org

Data quality affects machine learning (ML) model performances, and data scientists spend
considerable amount of time on data cleaning before model training. However, to date, there …

被引用次数：139 相关文章所有 8 个版本

高级搜索

QQ 群