Wrapper generation for semi-structured internet sources

S Soderland - Machine learning, 1999 - Springer

A wealth of on-line text information can be made available to automatic processing by
information extraction (IE) systems. Each IE application needs a separate set of rules tuned …

被引用次数：1661 相关文章所有 12 个版本

[PDF] microsoft.com

Vips: a vision-based page segmentation algorithm

D Cai, S Yu, JR Wen, WY Ma - 2003 - microsoft.com

A new web content structure analysis based on visual representation is proposed in this
paper. Many web applications such as information retrieval, information extraction and …

被引用次数：997 相关文章所有 10 个版本

[PDF] psu.edu

[PDF][PDF] Mining Product Reputations on the WEB

S Morinaga - Proceedings of 8th ACM SIGKDD International …, 2002 - Citeseer

Knowing the reputations of your own and/or competitors' products is important for marketing
and customer relationship management. It is, however, very costly to collect and analyze …

被引用次数：685 相关文章

Data analysis by web scraping using python

DM Thomas, S Mathur - 2019 3rd International conference on …, 2019 - ieeexplore.ieee.org

The standard information investigation are built on the root and impact relationship, shaped
an example minuscule examination, subjective and quantitative examination, the rationality …

被引用次数：139 相关文章

[PDF] googleapis.com

Form-based ontology creation and information harvesting

DW Embley, C Tao, SW Liddle - US Patent 8,103,962, 2012 - Google Patents

Extracting data from web pages. User input is received defining a tabular form. User input is
received correlating portions of the form with user selected data items contained in one or …

被引用次数：313 相关文章所有 4 个版本

[PDF] acm.org

Database techniques for the World-Wide Web: A survey

D Florescu, A Levy, A Mendelzon - ACM Sigmod Record, 1998 - dl.acm.org

The popularity of the World-Wide Web (WWW) has made it a prime vehicle for disseminating
information. The relevance of database concepts to the problems of managing and querying …

被引用次数：987 相关文章所有 32 个版本

[PDF] acm.org

NoDoSE—a tool for semi-automatically extracting structured and semistructured data from text documents

B Adelberg - Proceedings of the 1998 ACM SIGMOD international …, 1998 - dl.acm.org

Often interesting structured or semistructured data is not in database systems but in HTML
pages, text files, or on paper. The data in these formats is not usable by standard query …

被引用次数：617 相关文章所有 11 个版本

[PDF] psu.edu

Conceptual-model-based data extraction from multiple-record web pages

DW Embley, DM Campbell, YS Jiang, SW Liddle… - Data & Knowledge …, 1999 - Elsevier

Electronically available data on the Web is exploding at an ever increasing pace. Much of
this data is unstructured, which makes searching hard and traditional database querying …

被引用次数：552 相关文章所有 10 个版本

[PDF] aaai.org

[PDF][PDF] Navigational plans for data integration

MT Friedman, AY Levy, TD Millstein - AAAI/IAAI, 1999 - cdn.aaai.org

We consider the problem of building data integration systems when the data sources are
webs of data, rather than sets of relations. Previous approaches to modeling data sources …

被引用次数：509 相关文章所有 11 个版本

[PDF] psu.edu

Object-level ranking: bringing order to web objects

Z Nie, Y Zhang, JR Wen, WY Ma - … of the 14th international conference on …, 2005 - dl.acm.org

In contrast with the current Web search methods that essentially do document-level ranking
and retrieval, we are exploring a new paradigm to enable Web search at the object level. We …

被引用次数：431 相关文章所有 14 个版本

高级搜索

QQ 群