Confidence interval for micro-averaged F1 and macro-averaged F1 scores

K Takahashi, K Yamamoto, A Kuchiba, T Koyama - Applied Intelligence, 2022 - Springer
A binary classification problem is common in medical field, and we often use sensitivity,
specificity, accuracy, negative and positive predictive values as measures of performance of …

About the relationship between ROC curves and Cohen's kappa

A Ben-David - Engineering Applications of Artificial Intelligence, 2008 - Elsevier
Receiver operating characteristic (ROC) curves are very powerful tools for measuring
classifiers' accuracy in binary-class problems. However, their usefulness in real-world multi …

Cross-validation and bootstrapping are unreliable in small sample classification

A Isaksson, M Wallman, H Göransson… - Pattern Recognition …, 2008 - Elsevier
The interest in statistical classification for critical applications such as diagnoses of patient
samples based on supervised learning is rapidly growing. To gain acceptance in …

Comparison of classification accuracy using Cohen's Weighted Kappa

A Ben-David - Expert Systems with Applications, 2008 - Elsevier
Many expert systems solve classification problems. While comparing the accuracy of such
classifiers, the cost of error must frequently be taken into account. In such cost-sensitive …

Evaluating classification accuracy for modern learning approaches

J Li, M Gao, R D'Agostino - Statistics in medicine, 2019 - Wiley Online Library
Deep learning neural network models such as multilayer perceptron (MLP) and
convolutional neural network (CNN) are novel and attractive artificial intelligence computing …

Probability estimation with machine learning methods for dichotomous and multicategory outcome: theory

J Kruppa, Y Liu, G Biau, M Kohler, IR König… - Biometrical …, 2014 - Wiley Online Library
Probability estimation for binary and multicategory outcome using logistic and multinomial
logistic regression has a long‐standing tradition in biostatistics. However, biases may occur …

Reliable Accuracy Estimates from k-Fold Cross Validation

TT Wong, PY Yeh - IEEE Transactions on Knowledge and Data …, 2019 - ieeexplore.ieee.org
It is popular to evaluate the performance of classification algorithms by k-fold cross
validation. A reliable accuracy estimate will have a relatively small variance, and several …

[PDF][PDF] On the bayes-optimality of f-measure maximizers

W Waegeman, K Dembczyński, A Jachnik… - Journal of Machine …, 2014 - jmlr.org
The F-measure, which has originally been introduced in information retrieval, is nowadays
routinely used as a performance metric for problems such as binary classification, multi-label …

[HTML][HTML] Computationally efficient confidence intervals for cross-validated area under the ROC curve estimates

E LeDell, M Petersen… - Electronic journal of …, 2015 - ncbi.nlm.nih.gov
In binary classification problems, the area under the ROC curve (AUC) is commonly used to
evaluate the performance of a prediction model. Often, it is combined with cross-validation in …

The Matthews correlation coefficient (MCC) should replace the ROC AUC as the standard metric for assessing binary classification

D Chicco, G Jurman - BioData Mining, 2023 - Springer
Binary classification is a common task for which machine learning and computational
statistics are used, and the area under the receiver operating characteristic curve (ROC …