A survey of web information extraction systems

CH Chang, M Kayed, MR Girgis… - IEEE transactions on …, 2006 - ieeexplore.ieee.org
The Internet presents a huge amount of useful information which is usually formatted for its
users, which makes it difficult to extract relevant data from various sources. Therefore, the …

[图书][B] Web data mining: exploring hyperlinks, contents, and usage data

B Liu - 2011 - Springer
Liu has written a comprehensive text on Web mining, which consists of two parts. The first
part covers the data mining and machine learning foundations, where all the essential …

Data-Centric Systems and Applications

MJ Carey, S Ceri, P Bernstein, U Dayal, C Faloutsos… - Italy: Springer, 2006 - Springer
The rapid growth of the Web in the past two decades has made it the largest publicly
accessible data source in the world. Web mining aims to discover useful information or …

[PDF][PDF] Roadrunner: Towards automatic data extraction from large web sites

V Crescenzi, G Mecca, P Merialdo - VLDB, 2001 - vldb.org
The paper investigates techniques for extracting data from HTML sites through the use of
automatically generated wrappers. To automate the wrapper generation and the data …

Flashextract: A framework for data extraction by examples

V Le, S Gulwani - Proceedings of the 35th ACM SIGPLAN Conference …, 2014 - dl.acm.org
Various document types that combine model and view (eg, text files, webpages,
spreadsheets) make it easy to organize (possibly hierarchical) data, but make it difficult to …

Extracting structured data from web pages

A Arasu, H Garcia-Molina - Proceedings of the 2003 ACM SIGMOD …, 2003 - dl.acm.org
Many web sites contain large sets of pages generated using a common template or layout.
For example, Amazon lays out the author, title, comments, etc. in the same way in all its book …

[图书][B] Domain-specific knowledge graph construction

M Kejriwal - 2019 - Springer
Domain-specific knowledge graphs have emerged as a field unto their own, steadily and
perhaps not so slowly. Graphs have been pervasive in AI for a long period of time, dating …

Web data extraction based on partial tree alignment

Y Zhai, B Liu - Proceedings of the 14th international conference on …, 2005 - dl.acm.org
This paper studies the problem of extracting data from a Web page that contains several
structured data records. The objective is to segment these data records, extract data …

Mining data records in web pages

B Liu, R Grossman, Y Zhai - Proceedings of the ninth ACM SIGKDD …, 2003 - dl.acm.org
A large amount of information on the Web is contained in regularly structured objects, which
we call data records. Such data records are important because they often present the …

Data extraction and label assignment for web databases

J Wang, FH Lochovsky - … of the 12th international conference on World …, 2003 - dl.acm.org
Many tools have been developed to help users query, extract and integrate data from web
pages generated dynamically from databases, ie, from the Hidden Web. A key prerequisite …