ViPER: augmenting automatic information extraction with visual perceptions

K Simon, G Lausen - Proceedings of the 14th ACM international …, 2005 - dl.acm.org
In this paper we address the problem of unsupervised Web data extraction. We show that
unsupervised Web data extraction becomes feasible when supposing pages that are made …

Extracting data records from the web using tag path clustering

G Miao, J Tatemura, WP Hsiung, A Sawires… - Proceedings of the 18th …, 2009 - dl.acm.org
Fully automatic methods that extract lists of objects from the Web have been studied
extensively. Record extraction, the first step of this object extraction process, identifies a set …

FiVaTech: Page-level web data extraction from template pages

M Kayed, CH Chang - IEEE transactions on knowledge and …, 2009 - ieeexplore.ieee.org
Web data extraction has been an important part for many Web data analysis applications. In
this paper, we formulate the data extraction problem as the decoding process of page …

A survey on region extractors from web documents

HA Sleiman, R Corchuelo - IEEE Transactions on Knowledge …, 2012 - ieeexplore.ieee.org
Extracting information from web documents has become a research area in which new
proposals sprout out year after year. This has motivated several researchers to work on …

Unlocking social media and user generated content as a data source for knowledge management

J Meneghello, N Thompson, K Lee… - International Journal of …, 2020 - igi-global.com
The pervasiveness of social media and user-generated content has triggered an
exponential increase in global data. However, due to collection and extraction challenges …

Indexing the invisible web: a survey

Y Ru, E Horowitz - Online Information Review, 2005 - emerald.com
Purpose–The existence and continued growth of the invisible web creates a major
challenge for search engines that are attempting to organize all of the material on the web …

Web record extraction with Invariants

Z Chen, W Meng, E Dragut - Proceedings of the VLDB Endowment, 2022 - dl.acm.org
Web records are structured data on a Web page that embeds records retrieved from an
underlying database according to some templates. Mining data records on the Web enables …

AutoRM: An effective approach for automatic Web data record mining

S Shi, C Liu, Y Shen, C Yuan, Y Huang - Knowledge-Based Systems, 2015 - Elsevier
A Web database typically responds to a query with a Web page, which encodes the query
results into semi-structured data objects using HTML tags. We call such data objects Web …

STEM: a suffix tree-based method for web data records extraction

Y Fang, X Xie, X Zhang, R Cheng, Z Zhang - Knowledge and Information …, 2018 - Springer
To automatically extract data records from Web pages, the data record extraction algorithm
is required to be robust and efficient. However, most of existing algorithms are not robust …

Towards a unified solution: data record region detection and segmentation

L Bing, W Lam, Y Gu - Proceedings of the 20th ACM international …, 2011 - dl.acm.org
Although the task of data record extraction from Web pages has been studied extensively,
yet it fails to handle many pages due to their complexity in format or layout. In this paper, we …