Performance evaluation in machine learning: the good, the bad, the ugly, and the way forward

P Flach - Proceedings of the AAAI conference on artificial …, 2019 - aaai.org
This paper gives an overview of some ways in which our understanding of performance
evaluation measures for machine-learned classifiers has improved over the last twenty …

[PDF][PDF] Why question machine learning evaluation methods

N Japkowicz - AAAI workshop on evaluation methods for machine …, 2006 - cdn.aaai.org
The evaluation of classifiers or learning algorithms is not a topic that has, generally, been
given much thought in the fields of Machine Learning and Data Mining. More often than not …

Improving the practice of classifier performance assessment

NM Adams, DJ Hand - Neural computation, 2000 - direct.mit.edu
In this note we use examples from the literature to illustrate some poor practices in
assessing the performance of supervised classification rules, and we suggest guidelines for …

An experimental comparison of performance measures for classification

C Ferri, J Hernández-Orallo, R Modroiu - Pattern recognition letters, 2009 - Elsevier
Performance metrics in classification are fundamental in assessing the quality of learning
methods and learned models. However, many different measures have been defined in the …

[PDF][PDF] On over-fitting in model selection and subsequent selection bias in performance evaluation

GC Cawley, NLC Talbot - The Journal of Machine Learning Research, 2010 - jmlr.org
Abstract Model selection strategies for machine learning algorithms typically involve the
numerical optimisation of an appropriate model selection criterion, often based on an …

[PDF][PDF] On applied research in machine learning

F Provost, R Kohavi - MACHINE LEARNING-BOSTON-, 1998 - ai.stanford.edu
Common arguments for including applications papers in the Machine Learning literature are
often based on the papers' value for advertising success stories and for morale boosting. For …

Beyond accuracy, F-score and ROC: a family of discriminant measures for performance evaluation

M Sokolova, N Japkowicz, S Szpakowicz - Australasian joint conference on …, 2006 - Springer
Different evaluation measures assess different characteristics of machine learning
algorithms. The empirical evaluation of algorithms and classifiers is a matter of on-going …

Apples-to-apples in cross-validation studies: pitfalls in classifier performance measurement

G Forman, M Scholz - Acm Sigkdd Explorations Newsletter, 2010 - dl.acm.org
Cross-validation is a mainstay for measuring performance and progress in machine
learning. There are subtle differences in how exactly to compute accuracy, F-measure and …

A lot of randomness is hiding in accuracy

A Ben-David - Engineering Applications of Artificial Intelligence, 2007 - Elsevier
The proportion of successful hits, usually referred to as “accuracy”, is by far the most
dominant meter for measuring classifiers' accuracy. This is despite of the fact that accuracy …

A novel performance measure for machine learning classification

M Gong - … Journal of Managing Information Technology (IJMIT) …, 2021 - papers.ssrn.com
Abstract Machine learning models have been widely used in numerous classification
problems and performance measures play a critical role in machine learning model …