A systematic review of current trends in web content mining

MO Samuel, AI Tolulope… - Journal of Physics …, 2019 - iopscience.iop.org
Abstract Knowledge in web documents, Relevance ranking of webpages and so on are
some of the under-researched areas in web content mining (WCM). Apart from the general …

Web data extraction approach for deep web using WEIDJ

IAA Sabri, M Man, WAWA Bakar, ANM Rose - Procedia Computer Science, 2019 - Elsevier
Data extraction is one of the most prominent areas in data mining analysis that is been
extensively studied especially in the field of data requirements and reservoir. The main aim …

Scraping relevant images from web pages without download

E Uzun - ACM Transactions on the Web, 2023 - dl.acm.org
Automatically scraping relevant images from web pages is an error-prone and time-
consuming task, leading experts to prefer manually preparing extraction patterns for a …

Semantic web mining for content-based online shopping recommender systems

IT Afolabi, OS Makinde, OO Oladipupo - International Journal of …, 2019 - igi-global.com
Currently, for content-based recommendations, semantic analysis of text from webpages
seems to be a major problem. In this research, we present a semantic web content mining …

Main content extraction from web pages based on node characteristics

Q Liu, M Shao, L Wu, G Zhao, G Fan… - Journal of Computing …, 2017 - koreascience.kr
Main content extraction of web pages is widely used in search engines, web content
aggregation and mobile Internet browsing. However, a mass of irrelevant information such …

Automatic Regular Expression Generation for Extracting Relevant Image Data From Web Pages using Genetic Algorithms

C Aslanyürek, T Yerlikaya - IEEE Access, 2024 - ieeexplore.ieee.org
In this study, a method that automatically generates regular expressions using genetic
algorithms is designed to extract relevant images on web pages. Data extraction, which is …

[PDF][PDF] Improving performance of DOM in semi-structured data extraction using WEIDJ model

IAA Sabri, M Man - Indonesian Journal of Electrical Engineering and …, 2018 - academia.edu
Web data extraction is the process of extracting user required information from web page.
The information consists of semi-structured data not in structured format. The extraction data …

WEIDJ: Development of a new algorithm for semi-structured web data extraction

IAA Sabri, M Man - … Telecommunication Computing Electronics …, 2021 - telkomnika.uad.ac.id
In the era of industrial digitalization, people are increasingly investing in solutions that allow
their process for data collection, data analysis and performance improvement. In this paper …

WEIDJ: An improvised algorithm for image extraction from web pages

IAA Sabri, M Man - 2017 8th International Conference on …, 2017 - ieeexplore.ieee.org
World wide web (www) is a huge information repository and rapidly growing as source of
information. Web pages is known as semi-structured data and it contains variety of …

Automatically discovering relevant images from web pages

E Uzun, E Özhan, HV Agun, T Yerlikaya… - Ieee …, 2020 - ieeexplore.ieee.org
Web pages contain irrelevant images along with relevant images. The classification of these
images is an error-prone process due to the number of design variations of web pages …