information sources published as HTML pages on World Wide Web. However, there is lot of
redundant and irrelevant information also on web pages. Navigation panels, Table of
content (TOC), advertisements, copyright statements, service catalogs, privacy policies etc.
on web pages are considered as relevant and irrelevant content. Such information makes
various web mining tasks such as web page crawling, web page classification, link based …