作者
Jaeho Shin, Sen Wu, Feiran Wang, Christopher De Sa, Ce Zhang, Christopher Ré
发表日期
2015/7
期刊
Proceedings of the VLDB Endowment International Conference on Very Large Data Bases
卷号
8
期号
11
页码范围
1310
出版商
NIH Public Access
简介
Populating a database with unstructured information is a long-standing problem in industry and research that encompasses problems of extraction, cleaning, and integration. Recent names used for this problem include dealing with dark data and knowledge base construction (KBC). In this work, we describe DeepDive, a system that combines database and machine learning ideas to help develop KBC systems, and we present techniques to make the KBC process more efficient. We observe that the KBC process is iterative, and we develop techniques to incrementally produce inference results for KBC systems. We propose two methods for incremental inference, based respectively on sampling and variational techniques. We also study the tradeoff space of these methods and develop a simple rule-based optimizer. DeepDive includes all of these contributions, and we evaluate Deep-Dive on five KBC systems …
引用总数
20152016201720182019202020212022202320241237534345302922252
学术搜索中的文章
J Shin, S Wu, F Wang, C De Sa, C Zhang, C Ré - Proceedings of the VLDB Endowment International …, 2015