[PDF][PDF] Deep Web 数据集成研究综述

刘伟, 孟小峰, 孟卫一 - 计算机学报, 2007 - c.xml.org.cn
As the rapid development of World Wide Web, there is tremendous information" hiddened" in
Deep Web, and its capacity is increasing rapidly. The information can only be accessed by …

[PDF][PDF] Roadrunner: Towards automatic data extraction from large web sites

V Crescenzi, G Mecca, P Merialdo - VLDB, 2001 - vldb.org
The paper investigates techniques for extracting data from HTML sites through the use of
automatically generated wrappers. To automate the wrapper generation and the data …

[图书][B] Mashups

F Daniel, M Matera, F Daniel, M Matera - 2014 - Springer
After introducing the core constituents of mashups, the components, in this chapter we study
what it means to integrate different components into composite applications, that is …

Automatic information extraction from large websites

V Crescenzi, G Mecca - Journal of the ACM (JACM), 2004 - dl.acm.org
Information extraction from websites is nowadays a relevant problem, usually performed by
software modules called wrappers. A key requirement is that the wrapper generation …

Wrapper approaches for web data extraction: A review

MABM Azir, KB Ahmad - 2017 6th International Conference on …, 2017 - ieeexplore.ieee.org
Relational databases are known as collections of structured data within the digital structure
and are normally arranged in rows and columns. However, most business data are present …

OXPath: A language for scalable data extraction, automation, and crawling on the deep web

T Furche, G Gottlob, G Grasso, C Schallhart, A Sellers - The VLDB Journal, 2013 - Springer
The evolution of the web has outpaced itself: A growing wealth of information and
increasingly sophisticated interfaces necessitate automated processing, yet existing …

[图书][B] Handbook of human factors in Web design

KPL Vu, RW Proctor - 2011 - books.google.com
This second edition of a bestseller provides up-to-date knowledge of human factors issues
in web design. It comprehensively treats human factors research methods, design …

[PDF][PDF] The spheresearch engine for unified ranked retrieval of heterogeneous XML and web documents

J Graupmann, R Schenkel, G Weikum - Proceedings of the 31st …, 2005 - vldb.org
This paper presents the novel SphereSearch Engine that provides unified ranked retrieval
on heterogeneous XML and Web data. Its search capabilities include vague structure …

YAWN: A semantically annotated Wikipedia XML corpus

R Schenkel, F Suchanek, G Kasneci - 2007 - dl.gi.de
The paper presents YAWN, a system to convert the well-known and widely used Wikipedia
collection into an XML corpus with semantically rich, self-explaining tags. We introduce …

[PDF][PDF] Synthesizing an integrated ontology

D Beneventano, S Bergamaschi… - IEEE Internet …, 2003 - dbgroup.ing.unimo.it
Web approaches employ annotation techniques to link individual information resources with
machine-comprehensible metadata. Before we can realize the potential this new vision …