A novel approach to data extraction on hyperlinked webpages

K Shaukat, N Masood, M Khushi - Applied Sciences, 2019 - mdpi.com
… We downloaded 15,000 web pages using our in-house developed web-crawler, from … web
tables and cannot efficiently answer user queries. Our novel method successfully extracted

A novel web scraping approach using the additional information obtained from web pages

E Uzun - IEEE Access, 2020 - ieeexplore.ieee.org
web scraping literature, it is observed that time efficiency is ignored. This study proposes a
novel approach, namely UzunExt, which extracts content … presents web content extraction

Roller: a novel approach to Web information extraction

P Jiménez, R Corchuelo - Knowledge and Information Systems, 2016 - Springer
… The research regarding Web information extraction focuses on learning rules to extract some
selected information from Web documents. Many proposals are ad hoc and cannot benefit …

A novel text mining approach for scholar information extraction from web content in Chinese

X Xie, Y Fu, H Jin, Y Zhao, W Cao - Future Generation Computer Systems, 2020 - Elsevier
… [12] proposed a novel approach for email clustering based on semantics. They used HowNet
… the structure of the web source, then clean the web pages and leave useful information for …

HTML web content extraction using paragraph tags

HJ Carey, M Manic - 2016 IEEE 25th International Symposium …, 2016 - ieeexplore.ieee.org
… the main content of a web page. Therefore, this paper presents Paragraph Extractor (ParEx),
a novel method used to identify the main text content within an article on a website while …

Pattern matching for extraction of core contents from news web pages

S Sirsat, V Chavan - … Second International Conference on Web …, 2016 - ieeexplore.ieee.org
… It presumes that subject content extraction of pages has been essential for Web information
pretreatment link. It can reduce the browsing time, promote the speed of user accessing to …

Entropy based informative content density approach for efficient web content extraction

M Annam, GP Sajeev - 2016 International conference on …, 2016 - ieeexplore.ieee.org
… We propose a web content extraction technique build on … a novel method, EICD (Entropy
based Informative Content Density… approach we consider the HTML content of the web page for …

[PDF][PDF] Extraction of Core Web Content from Web Pages using Noise Elimination.

A Saravanan, S Sathya Bama - Journal of Engineering Science & …, 2020 - jestr.org
… also degrades the performance of content extraction. These uninteresting blocks include …
novel method to remove the noises in the web page and extracts the significant content. The …

Main content extraction from heterogeneous webpages

J Alarte, D Insa, J Silva, S Tamarit - International Conference on Web …, 2018 - Springer
… adaptation to mobile devices, web content printing, etc. We introduce a novel site-level
technique for content extraction based on the DOM representation of webpages. This technique …

[PDF][PDF] Various Approaches for Content Extraction from Web Pages based on Factors

DM Kene, A Iqbal - Recent Advancements in Science and Technology, 2024 - vbmv.org
… In this method the Vision based Page Segmentation algorithm … [13] A Novel approach for
content extraction from web pages … attributes [12] of nodes for content extraction. The popular …