A systematic review of current trends in web content mining

MO Samuel, AI Tolulope… - Journal of Physics …, 2019 - iopscience.iop.org
Abstract Knowledge in web documents, Relevance ranking of webpages and so on are
some of the under-researched areas in web content mining (WCM). Apart from the general …

HTML web content extraction using paragraph tags

HJ Carey, M Manic - 2016 IEEE 25th International Symposium …, 2016 - ieeexplore.ieee.org
With the ever expanding use of the internet to disseminate information across the world,
gathering useful information from the multitude of web page styles continues to be a difficult …

Semantic web mining for content-based online shopping recommender systems

IT Afolabi, OS Makinde, OO Oladipupo - International Journal of …, 2019 - igi-global.com
Currently, for content-based recommendations, semantic analysis of text from webpages
seems to be a major problem. In this research, we present a semantic web content mining …

Semantics based web ranking using a robust weight scheme

RV Priya, V Vijayakumar, L Yang - International Journal of Web …, 2019 - igi-global.com
In this paper, HTML tags and attributes are used to determine different structural position of
text in a web page. Tags-attributes based models are used to assign a weight to a text that …

Web content extraction based on subject detection and node density

W Petprasit, S Jaiyen - 2015 7th International Conference on …, 2015 - ieeexplore.ieee.org
Currently, very large data have been transferred from everywhere through World Wide Web.
Consequently, the information extraction systems have been arising and many researches …

Marketplace affiliates potential analysis using cosine similarity and vision-based page segmentation

WB Zulfikar, M Irfan, M Ghufron, J Jumadi… - Bulletin of Electrical …, 2020 - beei.org
One success factor of an online affiliate is determined by the quality of the content source.
Therefore, affiliate marketplaces need to do an objective assessment to retrieve content data …

[PDF][PDF] Various Approaches for Content Extraction from Web Pages based on Factors

DM Kene, A Iqbal - Recent Advancements in Science and Technology, 2024 - vbmv.org
With the huge development of the internet and web publishing techniques generally create
numerous information sources published as HTML pages on World Wide Web. So Extraction …

无链接文档排序算法研究

蒋招龙, 赵泽茂 - 杭州电子科技大学学报: 自然科学版, 2015 - cqvip.com
大数据时代的到来, 数据格式呈现多样化, 对Web 数据的处理不仅仅局限在网页链接上,
还需要处理无链接结构的文档. 如何从海量的文档中获取所需的信息是搜索引擎亟待解决的问题 …

[PDF][PDF] A Research on Web Content Extraction and Noise Reduction through Text Density Using Malicious URL Pattern Detection

C Patel, H Diwanji - 2016 - academia.edu
ABSTRACT A Web Page has large amount of information including some additional
contents like hyperlinks, header footer, navigational panel; advertisements which may cause …

An Improved VIPS-based Algorithm of Extracting Web Content

L Li, AM Zhou, Y Fang, L Liu, Q Wu - Applied Mechanics and …, 2014 - Trans Tech Publ
The paper studies the VIPS algorithm, and improves VIPS which has the deficiency with
complex rules and low performance, according that the Web page has the feature of DIV …