Simplifying access to large-scale structured datasets by meta-profiling with scalable training set enrichment

S Pavia, R Khan, A Pyayt, M Gubanov - Proceedings of the 2022 …, 2022 - dl.acm.org
Proceedings of the 2022 International Conference on Management of Data, 2022dl.acm.org
Accessing large-scale structured datasets such as WDC [21], having millions of tables
coming from hundreds of thousands of sources is very challenging [11, 13, 14, 30, 31]. Even
if one topic (eg Job postings) is of interest, Jobs tables in different sources have hundreds of
different schemas, which significantly complicates both finding and querying them. Here we
demonstrate our scalable Meta-data profiler, capable of constructing a standardized
interface to a topic of interest in large-scale structured datasets using Deep-Learning and …
Accessing large-scale structured datasets such as WDC [21], having millions of tables coming from hundreds of thousands of sources is very challenging [11, 13, 14, 30, 31]. Even if one topic (e.g. Job postings) is of interest, Jobs tables in different sources have hundreds of different schemas, which significantly complicates both finding and querying them.
Here we demonstrate our scalable Meta-data profiler, capable of constructing a standardized interface to a topic of interest in large-scale structured datasets using Deep-Learning and our new unsupervised, scalable training set enrichment algorithm. This interface, called Meta-profile represents a meta-data summary per each topic, representative of the entire dataset. It helps data scientists and end users get access to all relevant topical tables, even in ultra large-scale datasets such as WDC, which would be very difficult or impossible otherwise [22, 31].
ACM Digital Library
以上显示的是最相近的搜索结果。 查看全部搜索结果

Google学术搜索按钮

example.edu/paper.pdf
搜索
获取 PDF 文件
引用
References