Apples-to-apples in cross-validation studies: pitfalls in classifier performance measurement

G Forman, M Scholz - Acm Sigkdd Explorations Newsletter, 2010 - dl.acm.org
Cross-validation is a mainstay for measuring performance and progress in machine
learning. There are subtle differences in how exactly to compute accuracy, F-measure and …

On model evaluation under non-constant class imbalance

J Brabec, T Komárek, V Franc, L Machlica - Computational Science–ICCS …, 2020 - Springer
Many real-world classification problems are significantly class-imbalanced to detriment of
the class of interest. The standard set of proper evaluation metrics is well-known but the …

[PDF][PDF] Classifier cascades and trees for minimizing feature evaluation cost

Z Xu, MJ Kusner, KQ Weinberger, M Chen… - The Journal of Machine …, 2014 - jmlr.org
Abstract Machine learning algorithms have successfully entered industry through many real-
world applications (eg, search engines and product recommendations). In these …

[PDF][PDF] On applied research in machine learning

F Provost, R Kohavi - MACHINE LEARNING-BOSTON-, 1998 - ai.stanford.edu
Common arguments for including applications papers in the Machine Learning literature are
often based on the papers' value for advertising success stories and for morale boosting. For …

Class imbalances versus small disjuncts

T Jo, N Japkowicz - ACM Sigkdd Explorations Newsletter, 2004 - dl.acm.org
It is often assumed that class imbalances are responsible for significant losses of
performance in standard classifiers. The purpose of this paper is to the question whether …

Precision-recall versus accuracy and the role of large data sets

B Juba, HS Le - Proceedings of the AAAI conference on artificial …, 2019 - ojs.aaai.org
Practitioners of data mining and machine learning have long observed that the imbalance of
classes in a data set negatively impacts the quality of classifiers trained on that data …

Performance evaluation in machine learning: the good, the bad, the ugly, and the way forward

P Flach - Proceedings of the AAAI conference on artificial …, 2019 - aaai.org
This paper gives an overview of some ways in which our understanding of performance
evaluation measures for machine-learned classifiers has improved over the last twenty …

[PDF][PDF] C4. 5, class imbalance, and cost sensitivity: why under-sampling beats over-sampling

C Drummond, RC Holte - … on learning from imbalanced datasets II, 2003 - eiti.uottawa.ca
This paper takes a new look at two sampling schemes commonly used to adapt machine
learning algorithms to imbalanced classes and misclassification costs. It uses a performance …

The effects of class imbalance and training data size on classifier learning: an empirical study

W Zheng, M Jin - SN Computer Science, 2020 - Springer
This study discusses the effects of class imbalance and training data size on the predictive
performance of classifiers. An empirical study was performed on ten classifiers arising from …

Study on the Impact of Partition-Induced Dataset Shift on -Fold Cross-Validation

JG Moreno-Torres, JA Sáez… - IEEE transactions on …, 2012 - ieeexplore.ieee.org
Cross-validation is a very commonly employed technique used to evaluate classifier
performance. However, it can potentially introduce dataset shift, a harmful factor that is often …