Web-Page Content Classification on Entropy Classifiers using Machine Learning

SA Siddiqha, M Islabudeen - 2023 International Conference for …, 2023 - ieeexplore.ieee.org
2023 International Conference for Advancement in Technology (ICONAT), 2023ieeexplore.ieee.org
In recent years, the World Wide Web (WWW) has become a global data center, which
permits people to store and distribute their information. The information in Web Pages may
be related to be personal, official, commercial and business. The users of Web would like to
access such information for their needs. Therefore, to use the Web data for any specific
purpose, it is necessary to have techniques which will classify the Web Pages so that the
suitable data available in Web Page are provided to users. This paper proposes a new …
In recent years, the World Wide Web (WWW) has become a global data center, which permits people to store and distribute their information. The information in Web Pages may be related to be personal, official, commercial and business. The users of Web would like to access such information for their needs. Therefore, to use the Web data for any specific purpose, it is necessary to have techniques which will classify the Web Pages so that the suitable data available in Web Page are provided to users. This paper proposes a new technique for classification of Web Pages using level based classification and hierarchical indexing model based on predefined domains: Sports, Politics and education. The method works in two important phases: Training phase and Testing phase. During training phase the dynamic Feature Extraction and Knowledge Representation is performed. During testing phase the features extracted from the Web Pages are used for content matching for Classification. The technique comprises three steps namely: Dynamic Feature Extraction, Knowledge Representation and Classification for randomly distributed Web Pages. During Feature Extraction the important keywords are extracted from Headers and Paragraphs of Web Pages. The Frequency Occurrence of Key Words is determined and the frequency values are multiplied with weights so as to segregate the keywords at different priority levels. The Represented Knowledge is further used for content matching for classification of Web Pages. The percentage of belongingness of the webpage for each such category is calculated using Maximum Entropy Classifier. Maximum Entropy Classifier is considered due to its advantage in search based optimizations. The method is evaluated with three different categories of Web Page such as Sports, Politics and Education. The technique has achieved the Classification accuracy of 91% which is higher than conventional Classification technique.
ieeexplore.ieee.org
以上显示的是最相近的搜索结果。 查看全部搜索结果