RoadRunner: automatic data extraction from data-intensive web sites

刘伟，孟小峰，孟卫一 - 计算机学报, 2007 - c.xml.org.cn

As the rapid development of World Wide Web, there is tremendous information" hiddened" in
Deep Web, and its capacity is increasing rapidly. The information can only be accessed by …

被引用次数：140 相关文章所有 4 个版本

[PDF] vldb.org

[PDF][PDF] Roadrunner: Towards automatic data extraction from large web sites

V Crescenzi, G Mecca, P Merialdo - VLDB, 2001 - vldb.org

The paper investigates techniques for extracting data from HTML sites through the use of
automatically generated wrappers. To automate the wrapper generation and the data …

被引用次数：1592 相关文章所有 27 个版本

[PDF] floriandaniel.it

[图书][B] Mashups

F Daniel, M Matera, F Daniel, M Matera - 2014 - Springer

After introducing the core constituents of mashups, the components, in this chapter we study
what it means to integrate different components into composite applications, that is …

被引用次数：172 相关文章所有 9 个版本

[PDF] psu.edu

Automatic information extraction from large websites

V Crescenzi, G Mecca - Journal of the ACM (JACM), 2004 - dl.acm.org

Information extraction from websites is nowadays a relevant problem, usually performed by
software modules called wrappers. A key requirement is that the wrapper generation …

被引用次数：255 相关文章所有 8 个版本

Wrapper approaches for web data extraction: A review

MABM Azir, KB Ahmad - 2017 6th International Conference on …, 2017 - ieeexplore.ieee.org

Relational databases are known as collections of structured data within the digital structure
and are normally arranged in rows and columns. However, most business data are present …

被引用次数：25 相关文章

[PDF] academia.edu

OXPath: A language for scalable data extraction, automation, and crawling on the deep web

T Furche, G Gottlob, G Grasso, C Schallhart, A Sellers - The VLDB Journal, 2013 - Springer

The evolution of the web has outpaced itself: A growing wealth of information and
increasingly sophisticated interfaces necessitate automated processing, yet existing …

被引用次数：126 相关文章所有 15 个版本

[PDF] researchgate.net

[图书][B] Handbook of human factors in Web design

KPL Vu, RW Proctor - 2011 - books.google.com

This second edition of a bestseller provides up-to-date knowledge of human factors issues
in web design. It comprehensively treats human factors research methods, design …

被引用次数：143 相关文章所有 9 个版本

[PDF] vldb.org

[PDF][PDF] The spheresearch engine for unified ranked retrieval of heterogeneous XML and web documents

J Graupmann, R Schenkel, G Weikum - Proceedings of the 31st …, 2005 - vldb.org

This paper presents the novel SphereSearch Engine that provides unified ranked retrieval
on heterogeneous XML and Web data. Its search capabilities include vague structure …

被引用次数：157 相关文章所有 15 个版本

[PDF] gi.de

YAWN: A semantically annotated Wikipedia XML corpus

R Schenkel, F Suchanek, G Kasneci - 2007 - dl.gi.de

The paper presents YAWN, a system to convert the well-known and widely used Wikipedia
collection into an XML corpus with semantically rich, self-explaining tags. We introduce …

被引用次数：145 相关文章所有 7 个版本

[PDF] unimo.it

[PDF][PDF] Synthesizing an integrated ontology

D Beneventano, S Bergamaschi… - IEEE Internet …, 2003 - dbgroup.ing.unimo.it

Web approaches employ annotation techniques to link individual information resources with
machine-comprehensible metadata. Before we can realize the potential this new vision …

被引用次数：137 相关文章所有 18 个版本

高级搜索

QQ 群