Adaptive calibrator ensemble: Navigating test set difficulty in out-of-distribution scenarios

Y Zou, W Deng, L Zheng - Proceedings of the IEEE/CVF …, 2023 - openaccess.thecvf.com
Abstract Model calibration usually requires optimizing some parameters (eg, temperature)
wrt an objective function like negative log-likelihood. This work uncovers a significant aspect …

Energy-based automated model evaluation

R Peng, H Zou, H Wang, Y Zeng, Z Huang… - arXiv preprint arXiv …, 2024 - arxiv.org
The conventional evaluation protocols on machine learning models rely heavily on a
labeled, iid-assumed testing dataset, which is not often present in real world applications …

Cifar-10-warehouse: Broad and more realistic testbeds in model generalization analysis

X Sun, X Leng, Z Wang, Y Yang, Z Huang… - arXiv preprint arXiv …, 2023 - arxiv.org
Analyzing model performance in various unseen environments is a critical research problem
in the machine learning community. To study this problem, it is important to construct a …

Quantifying the hardness of bioactivity prediction tasks for transfer learning

H Fooladi, S Hirte, J Kirchmair - Journal of Chemical Information …, 2024 - ACS Publications
Today, machine learning methods are widely employed in drug discovery. However, the
chronic lack of data continues to hamper their further development, validation, and …

Poor-supervised evaluation for superllm via mutual consistency

P Yuan, S Feng, Y Li, X Wang, B Pan, H Wang… - arXiv preprint arXiv …, 2024 - arxiv.org
The guidance from capability evaluations has greatly propelled the progress of both human
society and Artificial Intelligence. However, as LLMs evolve, it becomes challenging to …

What You See Is What You Get: Experience Ranking with Deep Neural Dataset-to-Dataset Similarity for Topological Localisation

M Gadd, B Ramtoula, D De Martini… - … on Experimental Robotics, 2023 - Springer
Recalling the most relevant visual memories for localisation or understanding a priori the
likely outcome of localisation effort against a particular visual memory is useful for efficient …

Ranked from Within: Ranking Large Multimodal Models for Visual Question Answering Without Labels

W Tu, W Deng, D Campbell, Y Yao, J Zheng… - arXiv preprint arXiv …, 2024 - arxiv.org
As large multimodal models (LMMs) are increasingly deployed across diverse applications,
the need for adaptable, real-world model ranking has become paramount. Traditional …

What Does Softmax Probability Tell Us about Classifiers Ranking Across Diverse Test Conditions?

W Tu, W Deng, L Zheng, T Gedeon - arXiv preprint arXiv:2406.09908, 2024 - arxiv.org
This work aims to develop a measure that can accurately rank the performance of various
classifiers when they are tested on unlabeled data from out-of-distribution (OOD) …

VDNA-PR: Using General Dataset Representations for Robust Sequential Visual Place Recognition

B Ramtoula, D De Martini, M Gadd… - arXiv preprint arXiv …, 2024 - arxiv.org
This paper adapts a general dataset representation technique to produce robust Visual
Place Recognition (VPR) descriptors, crucial to enable real-world mobile robot localisation …

Key Factors Determining the Required Number of Training Images in Person Re-Identification

T Sasaki, AS Walmsley, K Adachi, S Enomoto… - IEEE …, 2024 - ieeexplore.ieee.org
Focusing on person re-identification datasets, this paper proposes a new method to estimate
the test accuracy curve over the training image number in a precise, interpretable, and …