作者
Eli Cortez, Altigran S da Silva
发表日期
2010/6/11
图书
Proceedings of the Fourth SIGMOD PhD Workshop on Innovative Database Research
页码范围
49-54
简介
Information extraction by text segmentation (IETS) applies to cases in which data values of interest are organized in implicit semi-structured records available in textual sources (e.g. postal addresses, bibliographic information, ads). It is an important practical problem that has been frequently addressed in the recent literature. We report here partial results from a PhD thesis work in which we introduce ONDUX (On Demand Unsupervised Information Extraction), a new unsupervised probabilistic approach for IETS. As other unsupervised IETS approaches, ONDUX relies on information available on pre-existing data to associate segments in the input string with attributes of a given domain. Unlike other approaches, we rely on very effective matching strategies instead of explicit learning strategies. The effectiveness of this matching strategy is also exploited to disambiguate the extraction of certain attributes through a …
引用总数
20122013201420152016231
学术搜索中的文章
E Cortez, AS da Silva - Proceedings of the Fourth SIGMOD PhD Workshop on …, 2010