Describing differences in image sets with natural language

JN Yan, T Liu, J Chiu, J Shen, Z Qin, Y Yu… - Proceedings of the …, 2024 - aclanthology.org

Comparative reasoning plays a crucial role in predicting text preferences; however, large
language models (LLMs) often demonstrate inconsistencies in their reasoning, leading to …

被引用次数：5 相关文章所有 2 个版本

[PDF] thecvf.com

Discover and Mitigate Multiple Biased Subgroups in Image Classifiers

Z Zhang, M Feng, Z Li, C Xu - Proceedings of the IEEE/CVF …, 2024 - openaccess.thecvf.com

Abstract Machine learning models can perform well on in-distribution data but often fail on
biased subgroups that are underrepresented in the training data hindering the robustness of …

被引用次数：6 相关文章所有 4 个版本

[PDF] arxiv.org

What could go wrong? discovering and describing failure modes in computer vision

G Csurka, TL Hayes, D Larlus, R Volpi - arXiv preprint arXiv:2408.04471, 2024 - arxiv.org

Deep learning models are effective, yet brittle. Even carefully trained, their behavior tends to
be hard to predict when confronted with out-of-distribution samples. In this work, our goal is …

被引用次数：3 相关文章所有 2 个版本

[PDF] arxiv.org

VibeCheck: Discover and Quantify Qualitative Differences in Large Language Models

L Dunlap, K Mandal, T Darrell, J Steinhardt… - arXiv preprint arXiv …, 2024 - arxiv.org

Large language models (LLMs) often exhibit subtle yet distinctive characteristics in their
outputs that users intuitively recognize, but struggle to quantify. These" vibes"--such as tone …

被引用次数：1 相关文章所有 2 个版本

[PDF] arxiv.org

Unearthing Skill-Level Insights for Understanding Trade-Offs of Foundation Models

M Moayeri, V Balachandran, V Chandrasekaran… - arXiv preprint arXiv …, 2024 - arxiv.org

With models getting stronger, evaluations have grown more complex, testing multiple skills
in one benchmark and even in the same instance at once. However, skill-wise performance …

被引用次数：1 相关文章所有 2 个版本

[PDF] arxiv.org

Concept Bottleneck Models Without Predefined Concepts

S Schrodi, J Schur, M Argus, T Brox - arXiv preprint arXiv:2407.03921, 2024 - arxiv.org

There has been considerable recent interest in interpretable concept-based models such as
Concept Bottleneck Models (CBMs), which first predict human-interpretable concepts and …

被引用次数：1 相关文章所有 3 个版本

[PDF] arxiv.org

Trustworthy Transfer Learning: A Survey

J Wu, J He - arXiv preprint arXiv:2412.14116, 2024 - arxiv.org

Transfer learning aims to transfer knowledge or information from a source domain to a
relevant target domain. In this paper, we understand transfer learning from the perspectives …

Bayesian concept bottleneck models with llm priors

J Feng, A Kothari, L Zier, C Singh, YS Tan - arXiv preprint arXiv …, 2024 - arxiv.org

Concept Bottleneck Models (CBMs) have been proposed as a compromise between white-
box and black-box models, aiming to achieve interpretability without sacrificing accuracy …

被引用次数：1 相关文章所有 2 个版本

[PDF] arxiv.org

CAST: Cross-modal Alignment Similarity Test for Vision Language Models

G Dagan, O Loginova, A Batra - arXiv preprint arXiv:2409.11007, 2024 - arxiv.org

Vision Language Models (VLMs) are typically evaluated with Visual Question Answering
(VQA) tasks which assess a model's understanding of scenes. Good VQA performance is …

被引用次数：1 相关文章所有 3 个版本

[PDF] arxiv.org

Explaining Datasets in Words: Statistical Models with Natural Language Parameters

R Zhong, H Wang, D Klein, J Steinhardt - arXiv preprint arXiv:2409.08466, 2024 - arxiv.org

To make sense of massive data, we often fit simplified models and then interpret the
parameters; for example, we cluster the text embeddings and then interpret the mean …

高级搜索

QQ 群