Learning information extraction rules for semi-structured and free text

S Soderland - Machine learning, 1999 - Springer
A wealth of on-line text information can be made available to automatic processing by
information extraction (IE) systems. Each IE application needs a separate set of rules tuned …

Vips: a vision-based page segmentation algorithm

D Cai, S Yu, JR Wen, WY Ma - 2003 - microsoft.com
A new web content structure analysis based on visual representation is proposed in this
paper. Many web applications such as information retrieval, information extraction and …

[PDF][PDF] Mining Product Reputations on the WEB

S Morinaga - Proceedings of 8th ACM SIGKDD International …, 2002 - Citeseer
Knowing the reputations of your own and/or competitors' products is important for marketing
and customer relationship management. It is, however, very costly to collect and analyze …

Data analysis by web scraping using python

DM Thomas, S Mathur - 2019 3rd International conference on …, 2019 - ieeexplore.ieee.org
The standard information investigation are built on the root and impact relationship, shaped
an example minuscule examination, subjective and quantitative examination, the rationality …

Form-based ontology creation and information harvesting

DW Embley, C Tao, SW Liddle - US Patent 8,103,962, 2012 - Google Patents
Extracting data from web pages. User input is received defining a tabular form. User input is
received correlating portions of the form with user selected data items contained in one or …

Database techniques for the World-Wide Web: A survey

D Florescu, A Levy, A Mendelzon - ACM Sigmod Record, 1998 - dl.acm.org
The popularity of the World-Wide Web (WWW) has made it a prime vehicle for disseminating
information. The relevance of database concepts to the problems of managing and querying …

NoDoSE—a tool for semi-automatically extracting structured and semistructured data from text documents

B Adelberg - Proceedings of the 1998 ACM SIGMOD international …, 1998 - dl.acm.org
Often interesting structured or semistructured data is not in database systems but in HTML
pages, text files, or on paper. The data in these formats is not usable by standard query …

Conceptual-model-based data extraction from multiple-record web pages

DW Embley, DM Campbell, YS Jiang, SW Liddle… - Data & Knowledge …, 1999 - Elsevier
Electronically available data on the Web is exploding at an ever increasing pace. Much of
this data is unstructured, which makes searching hard and traditional database querying …

[PDF][PDF] Navigational plans for data integration

MT Friedman, AY Levy, TD Millstein - AAAI/IAAI, 1999 - cdn.aaai.org
We consider the problem of building data integration systems when the data sources are
webs of data, rather than sets of relations. Previous approaches to modeling data sources …

Object-level ranking: bringing order to web objects

Z Nie, Y Zhang, JR Wen, WY Ma - … of the 14th international conference on …, 2005 - dl.acm.org
In contrast with the current Web search methods that essentially do document-level ranking
and retrieval, we are exploring a new paradigm to enable Web search at the object level. We …