Web mining research: A survey

R Kosala, H Blockeel - ACM Sigkdd Explorations Newsletter, 2000 - dl.acm.org
With the huge amount of information available online, the World Wide Web is a fertile area
for data mining research. The Web mining research is at the cross road of research from …

[PDF][PDF] Roadrunner: Towards automatic data extraction from large web sites

V Crescenzi, G Mecca, P Merialdo - VLDB, 2001 - vldb.org
The paper investigates techniques for extracting data from HTML sites through the use of
automatically generated wrappers. To automate the wrapper generation and the data …

Form-based ontology creation and information harvesting

DW Embley, C Tao, SW Liddle - US Patent 8,103,962, 2012 - Google Patents
Extracting data from web pages. User input is received defining a tabular form. User input is
received correlating portions of the form with user selected data items contained in one or …

Generating finite-state transducers for semi-structured data extraction from the web

CN Hsu, MT Dung - Information systems, 1998 - Elsevier
Integrating a large number of Web information sources may significantly increase the utility
of the World-Wide Web. A promising solution to the integration is through the use of a Web …

[PDF][PDF] Visual web information extraction with lixto

R Baumgartner, S Flesca, G Gottlob - 2001 - ora.ox.ac.uk
We present new techniques for supervised wrapper generation and automated web
information extraction, and a system called Lixto implementing these techniques. Our system …

XWRAP: An XML-enabled wrapper construction system for web information sources

L Liu, C Pu, W Han - … of 16th International Conference on Data …, 2000 - ieeexplore.ieee.org
The paper describes the methodology and the software development of XWRAP, an XML-
enabled wrapper construction system for semi-automatic generation of wrapper programs …

[PDF][PDF] A hierarchical approach to wrapper induction

I Muslea, S Minton, C Knoblock - … of the third annual conference on …, 1999 - dl.acm.org
With the tremendous amount of information that becomes available on the Web on a daily
basis, the ability to quickly develop information agents has become a crucial problem. A vital …

Conceptual-model-based data extraction from multiple-record web pages

DW Embley, DM Campbell, YS Jiang, SW Liddle… - Data & Knowledge …, 1999 - Elsevier
Electronically available data on the Web is exploding at an ever increasing pace. Much of
this data is unstructured, which makes searching hard and traditional database querying …

Hierarchical wrapper induction for semistructured information sources

I Muslea, S Minton, CA Knoblock - Autonomous Agents and Multi-Agent …, 2001 - Springer
With the tremendous amount of information that becomes available on the Web on a daily
basis, the ability to quickly develop information agents has become a crucial problem. A vital …

Record-boundary discovery in Web documents

DW Embley, Y Jiang, YK Ng - Proceedings of the 1999 ACM SIGMOD …, 1999 - dl.acm.org
Extraction of information from unstructured or semistructured Web documents often requires
a recognition and delimitation of records.(By “record” we mean a group of information …