A critical analysis of metrics used for measuring progress in artificial intelligence

K Blagec, G Dorffner, M Moradi, M Samwald - arXiv preprint arXiv …, 2020 - arxiv.org
Comparing model performances on benchmark datasets is an integral part of measuring
and driving progress in artificial intelligence. A model's performance on a benchmark …

Why comparing single performance scores does not allow to draw conclusions about machine learning approaches

N Reimers, I Gurevych - arXiv preprint arXiv:1803.09578, 2018 - arxiv.org
Developing state-of-the-art approaches for specific tasks is a major driving force in our
research community. Depending on the prestige of the task, publishing it can come along …

Accounting for variance in machine learning benchmarks

X Bouthillier, P Delaunay, M Bronzi… - Proceedings of …, 2021 - proceedings.mlsys.org
Strong empirical evidence that one machine-learning algorithm A outperforms another one
B, ideally calls for multiple trials optimizing the learning pipeline over sources of variation …

[PDF][PDF] PerMetrics: A framework of performance metrics for machine learning models

N Van Thieu - Journal of Open Source Software, 2024 - joss.theoj.org
Performance metrics are pivotal in machine learning field, especially for tasks like
regression, classification, and clustering (Saura, 2021). They offer quantitative measures to …

Performance evaluation in machine learning: the good, the bad, the ugly, and the way forward

P Flach - Proceedings of the AAAI conference on artificial …, 2019 - aaai.org
This paper gives an overview of some ways in which our understanding of performance
evaluation measures for machine-learned classifiers has improved over the last twenty …

[HTML][HTML] Mapping global dynamics of benchmark creation and saturation in artificial intelligence

S Ott, A Barbosa-Silva, K Blagec, J Brauner… - Nature …, 2022 - nature.com
Benchmarks are crucial to measuring and steering progress in artificial intelligence (AI).
However, recent studies raised concerns over the state of AI benchmarking, reporting issues …

Insights into performance fitness and error metrics for machine learning

MZ Naser, A Alavi - arXiv preprint arXiv:2006.00887, 2020 - arxiv.org
Machine learning (ML) is the field of training machines to achieve high level of cognition and
perform human-like analysis. Since ML is a data-driven approach, it seemingly fits into our …

[PDF][PDF] On over-fitting in model selection and subsequent selection bias in performance evaluation

GC Cawley, NLC Talbot - The Journal of Machine Learning Research, 2010 - jmlr.org
Abstract Model selection strategies for machine learning algorithms typically involve the
numerical optimisation of an appropriate model selection criterion, often based on an …

Openml benchmarking suites

B Bischl, G Casalicchio, M Feurer, P Gijsbers… - arXiv preprint arXiv …, 2017 - arxiv.org
Machine learning research depends on objectively interpretable, comparable, and
reproducible algorithm benchmarks. We advocate the use of curated, comprehensive suites …

MLcps: machine learning cumulative performance score for classification problems

A Akshay, M Abedi, N Shekarchizadeh… - …, 2023 - academic.oup.com
Background Assessing the performance of machine learning (ML) models requires careful
consideration of the evaluation metrics used. It is often necessary to utilize multiple metrics …