Dealing with overlap and imbalance: a new metric and approach

Z Borsos, C Lemnaru, R Potolea - Pattern Analysis and Applications, 2018 - Springer
Pattern Analysis and Applications, 2018Springer
This paper addresses learning in complex scenarios involving imbalance and overlap. We
propose a novel measure, the Augmented R-value, for estimating the level of overlap in the
data. It improves an existing model-based measure, by including the data imbalance in the
estimation process. We provide both a theoretical demonstration and empirical validations of
the new metric's efficacy in estimating the overlap level. Another contribution of the present
paper is to propose a collection of meta-features to be used in conjunction with a meta …
Abstract
This paper addresses learning in complex scenarios involving imbalance and overlap. We propose a novel measure, the Augmented R-value, for estimating the level of overlap in the data. It improves an existing model-based measure, by including the data imbalance in the estimation process. We provide both a theoretical demonstration and empirical validations of the new metric’s efficacy in estimating the overlap level. Another contribution of the present paper is to propose a collection of meta-features to be used in conjunction with a meta-learning strategy for predicting the most suitable classifier for a given problem. The evaluations performed on a well-known collection of benchmark problems have shown that the meta-learning approach achieves superior results to the manual classifier selection process normally carried out by data scientists. The analysis of the results obtained by the meta-feature selection step has confirmed the power of the Augmented R-value in predicting the expected performance of classifiers in such complex classification scenarios. Also, we found that the overlap is a more serious factor affecting the performance of classifiers than imbalance.
Springer
以上显示的是最相近的搜索结果。 查看全部搜索结果