Bayesian data cleaning for Web data

Y Hu, S De, Y Chen, S Kambhampati - arXiv preprint arXiv:1204.3677, 2012 - arxiv.org
Data Cleaning is a long standing problem, which is growing in importance with the mass of
uncurated web data. State of the art approaches for handling inconsistent data are systems
that learn and use conditional functional dependencies (CFDs) to rectify data. These
methods learn data patterns--CFDs--from a clean sample of the data and use them to rectify
the dirty/inconsistent data. While getting a clean training sample is feasible in enterprise
data scenarios, it is infeasible in web databases where there is no separate curated data …

[PDF][PDF] Bayesian Data Cleaning for Web Data

YHSDY Chen, S Kambhampati - academia.edu
Data Cleaning is a long standing problem, which is growing in importance with the mass of
uncurated web data. State of the art approaches for handling inconsistent data are systems
that learn and use conditional functional dependencies (CFDs) to rectify data. These
methods learn data patterns–CFDs–from a clean sample of the data and use them to rectify
the dirty/inconsistent data. While getting a clean training sample is feasible in enterprise
data scenarios, it is infeasible in web databases where there is no separate curated data …
以上显示的是最相近的搜索结果。 查看全部搜索结果