Web content extraction based on subject detection and node density

W Petprasit, S Jaiyen - 2015 7th International Conference on …, 2015 - ieeexplore.ieee.org
Currently, very large data have been transferred from everywhere through World Wide Web.
Consequently, the information extraction systems have been arising and many researches …

A novel approach to automatically extracting main content of Web News

X Wang, W Wang, B Liu, Z Wang… - … Conference on E …, 2009 - ieeexplore.ieee.org
Recently, the Web has been the data repository. In order to obtain the relevant information
from the repository, many research have been made. The typical function of Web news …

An efficient method for extracting web news content

J Sun, L Tang, D Liao, V Chang - … International Conference on …, 2017 - ieeexplore.ieee.org
Web news extraction is a very important step in the process of Web intelligent information
processing. It is the basis of research and application of network public opinion monitoring …

Web page content extraction method based on link density and statistic

D Pan, S Qiu, D Yin - 2008 4th International Conference on …, 2008 - ieeexplore.ieee.org
Web page content extraction is a key step for knowledge acquisition from the Internet. The
physical layout of Web pages is always composed of useful information, advertising links …

Web content information extraction approach based on removing noise and content-features

D Yang, J Song - … conference on web information systems and …, 2010 - ieeexplore.ieee.org
This paper presents an improved approach to extract the main content from web pages.
There are a good many financial news pages which have so many links that the algorithms …

Research on Web information extraction based on spider algorithm and DOM thinking

X Han, XD Li, Q Zheng - 2010 International Conference on …, 2010 - ieeexplore.ieee.org
The structure characteristics of the website is complicated, Web information structure is not
fixed and not neat, so it is inefficient that the Web information is captured largely, the …

ContentEx: a framework for automatic content extraction programs

L Song, X Cheng, Y Guo, Y Liu… - 2009 IEEE International …, 2009 - ieeexplore.ieee.org
Web pages are often decorated with extraneous information (such as navigation bars,
branding banners, JavaScript and advertisements). This kind of information may distract …

ECON: an approach to extract content from web news page

Y Guo, H Tang, L Song, Y Wang… - 2010 12th International …, 2010 - ieeexplore.ieee.org
This paper provides a simple but effective approach, named ECON, to fully-automatically
extract content from Web news page. ECON uses a DOM tree to represent the Web news …

A comprehensive survey on web content extraction algorithms and techniques

SM Al-Ghuribi, S Alshomrani - 2013 International Conference …, 2013 - ieeexplore.ieee.org
Web Content Extraction is an important problem that has been studied through different
approaches and algorithms. It is interested in extracting meaningful and useful data from the …

Web content information extraction based on DOM tree and statistical information

X Yu, Z Jin - 2017 IEEE 17th International Conference on …, 2017 - ieeexplore.ieee.org
Booming web pages contain a lot of information, while they contain little content and much
unrelated noise information, such as script code, links, advertising and so on. These …