Deep web data extraction based on visual information processing

J Liu, L Lin, Z Cai, J Wang, H Kim - Journal of Ambient Intelligence and …, 2024 - Springer
J Liu, L Lin, Z Cai, J Wang, H Kim
Journal of Ambient Intelligence and Humanized Computing, 2024Springer
With the rapid development of technology, the Web has become the largest encyclopedic
database. Although users can get information conveniently on the surface web by using
applications such as browsers, it is hard to retrieve information in the deep web. Deep web
requires a user submit a query to the server to get information from its database to generate
the result webpage. Thus methods different from traditional Web surfing are needed to
conduct the data extraction in deep web. Most of the existing deep web data extraction …
Abstract
With the rapid development of technology, the Web has become the largest encyclopedic database. Although users can get information conveniently on the surface web by using applications such as browsers, it is hard to retrieve information in the deep web. Deep web requires a user submit a query to the server to get information from its database to generate the result webpage. Thus methods different from traditional Web surfing are needed to conduct the data extraction in deep web. Most of the existing deep web data extraction methods are based on DOM tree analysis. In this paper, to fully utilize the visual information contained in a webpage, a data region locating method based on convolutional neural network and a visual information based segmentation algorithm are proposed. In order to verify the efficiency of the proposed method, we apply it to real world commercial websites to perform data extraction. Experiments of data region location model, data extraction, and data item alignment verify that our proposed method can effectively improve the accuracy of data region location and the efficiency of data extraction.
Springer
以上显示的是最相近的搜索结果。 查看全部搜索结果

Google学术搜索按钮

example.edu/paper.pdf
搜索
获取 PDF 文件
引用
References