Web page classification: Features and algorithms

X Qi, BD Davison - ACM computing surveys (CSUR), 2009 - dl.acm.org
Classification of Web page content is essential to many tasks in Web information retrieval
such as maintaining Web directories and focused crawling. The uncontrolled nature of Web …

A comprehensive survey of numeric and symbolic outlier mining techniques

M Agyemang, K Barker, R Alhajj - Intelligent Data Analysis, 2006 - content.iospress.com
Data that appear to have different characteristics than the rest of the population are called
outliers. Identifying outliers from huge data repositories is a very complex task called outlier …

On strategies for imbalanced text classification using SVM: A comparative study

A Sun, EP Lim, Y Liu - Decision Support Systems, 2009 - Elsevier
Many real-world text classification tasks involve imbalanced training examples. The
strategies proposed to address the imbalanced classification (eg, resampling, instance …

Fast webpage classification using URL features

MY Kan, HON Thi - Proceedings of the 14th ACM international …, 2005 - dl.acm.org
We demonstrate the usefulness of the uniform resource locator (URL) alone in performing
web page classification. This approach is faster than typical web page classification, as the …

Rapid vitality estimation and prediction of corn seeds based on spectra and images using deep learning and hyperspectral imaging techniques

L Pang, S Men, L Yan, J Xiao - Ieee Access, 2020 - ieeexplore.ieee.org
Highly viable seeds are of great significance for agricultural development, and the traditional
corn seed vigor detection method is time-consuming and laborious. In this paper, the …

Ensemble learning with member optimization for fault diagnosis of a building energy system

H Han, Z Zhang, X Cui, Q Meng - Energy and Buildings, 2020 - Elsevier
For better service and energy savings, improved fault detection and diagnosis (FDD) of
building energy systems is of great importance. To achieve this aim, ensemble learning is …

Classifying illegal activities on tor network based on web textual contents

MW Al Nabki, E Fidalgo, E Alegre… - Proceedings of the 15th …, 2017 - aclanthology.org
The freedom of the Deep Web offers a safe place where people can express themselves
anonymously but they also can conduct illegal activities. In this paper, we present and make …

Classifier and feature set ensembles for web page classification

A Onan - Journal of Information Science, 2016 - journals.sagepub.com
Web page classification is an important research direction on web mining. The abundant
amount of data available on the web makes it essential to develop efficient and robust …

Visual content-based web page categorization with deep transfer learning and metric learning

D López-Sánchez, AG Arrieta, JM Corchado - Neurocomputing, 2019 - Elsevier
The growing amounts of online multimedia content challenge the current search,
recommendation and information retrieval systems. Information in the form of visual …

Combining link-based and content-based methods for web document classification

P Calado, M Cristo, E Moura, N Ziviani… - Proceedings of the …, 2003 - dl.acm.org
This paper studies how link information can be used to improve classification results for Web
collections. We evaluate four different measures of subject similarity, derived from the Web …