Cut and paste

R Kosala, H Blockeel - ACM Sigkdd Explorations Newsletter, 2000 - dl.acm.org

With the huge amount of information available online, the World Wide Web is a fertile area
for data mining research. The Web mining research is at the cross road of research from …

被引用次数：2598 相关文章所有 47 个版本

[PDF] vldb.org

[PDF][PDF] Roadrunner: Towards automatic data extraction from large web sites

V Crescenzi, G Mecca, P Merialdo - VLDB, 2001 - vldb.org

The paper investigates techniques for extracting data from HTML sites through the use of
automatically generated wrappers. To automate the wrapper generation and the data …

被引用次数：1592 相关文章所有 27 个版本

[PDF] googleapis.com

Form-based ontology creation and information harvesting

DW Embley, C Tao, SW Liddle - US Patent 8,103,962, 2012 - Google Patents

Extracting data from web pages. User input is received defining a tabular form. User input is
received correlating portions of the form with user selected data items contained in one or …

被引用次数：304 相关文章所有 4 个版本

[PDF] psu.edu

Generating finite-state transducers for semi-structured data extraction from the web

CN Hsu, MT Dung - Information systems, 1998 - Elsevier

Integrating a large number of Web information sources may significantly increase the utility
of the World-Wide Web. A promising solution to the integration is through the use of a Web …

被引用次数：722 相关文章所有 8 个版本

[PDF] ox.ac.uk

[PDF][PDF] Visual web information extraction with lixto

R Baumgartner, S Flesca, G Gottlob - 2001 - ora.ox.ac.uk

We present new techniques for supervised wrapper generation and automated web
information extraction, and a system called Lixto implementing these techniques. Our system …

被引用次数：812 相关文章所有 31 个版本

[PDF] academia.edu

XWRAP: An XML-enabled wrapper construction system for web information sources

L Liu, C Pu, W Han - … of 16th International Conference on Data …, 2000 - ieeexplore.ieee.org

The paper describes the methodology and the software development of XWRAP, an XML-
enabled wrapper construction system for semi-automatic generation of wrapper programs …

被引用次数：772 相关文章所有 13 个版本

[PDF] acm.org

[PDF][PDF] A hierarchical approach to wrapper induction

I Muslea, S Minton, C Knoblock - … of the third annual conference on …, 1999 - dl.acm.org

With the tremendous amount of information that becomes available on the Web on a daily
basis, the ability to quickly develop information agents has become a crucial problem. A vital …

被引用次数：624 相关文章所有 11 个版本

[PDF] psu.edu

Conceptual-model-based data extraction from multiple-record web pages

DW Embley, DM Campbell, YS Jiang, SW Liddle… - Data & Knowledge …, 1999 - Elsevier

Electronically available data on the Web is exploding at an ever increasing pace. Much of
this data is unstructured, which makes searching hard and traditional database querying …

被引用次数：552 相关文章所有 10 个版本

[PDF] academia.edu

Hierarchical wrapper induction for semistructured information sources

I Muslea, S Minton, CA Knoblock - Autonomous Agents and Multi-Agent …, 2001 - Springer

With the tremendous amount of information that becomes available on the Web on a daily
basis, the ability to quickly develop information agents has become a crucial problem. A vital …

被引用次数：544 相关文章所有 13 个版本

[PDF] acm.org

Record-boundary discovery in Web documents

DW Embley, Y Jiang, YK Ng - Proceedings of the 1999 ACM SIGMOD …, 1999 - dl.acm.org

Extraction of information from unstructured or semistructured Web documents often requires
a recognition and delimitation of records.(By “record” we mean a group of information …

被引用次数：472 相关文章所有 13 个版本

高级搜索

QQ 群

Web mining research: A survey