A step toward more inclusive people annotations for fairness

H Baniecki, P Biecek - Information Fusion, 2024 - Elsevier

Explainable artificial intelligence (XAI) methods are portrayed as a remedy for debugging
and trusting statistical and deep learning models, as well as interpreting their predictions …

被引用次数：57 相关文章所有 5 个版本

[PDF] thecvf.com

Segment anything

A Kirillov, E Mintun, N Ravi, H Mao… - Proceedings of the …, 2023 - openaccess.thecvf.com

Abstract We introduce the Segment Anything (SA) project: a new task, model, and dataset for
image segmentation. Using our efficient model in a data collection loop, we built the largest …

被引用次数：6664 相关文章所有 12 个版本

[PDF] arxiv.org

Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

G Team, P Georgiev, VI Lei, R Burnell, L Bai… - arXiv preprint arXiv …, 2024 - arxiv.org

In this report, we introduce the Gemini 1.5 family of models, representing the next generation
of highly compute-efficient multimodal models capable of recalling and reasoning over fine …

被引用次数：593 相关文章所有 4 个版本

[PDF] arxiv.org

Pali: A jointly-scaled multilingual language-image model

X Chen, X Wang, S Changpinyo… - arXiv preprint arXiv …, 2022 - arxiv.org

Effective scaling and a flexible task interface enable large language models to excel at many
tasks. We present PaLI (Pathways Language and Image model), a model that extends this …

被引用次数：596 相关文章所有 6 个版本

[PDF] arxiv.org

Videopoet: A large language model for zero-shot video generation

D Kondratyuk, L Yu, X Gu, J Lezama, J Huang… - arXiv preprint arXiv …, 2023 - arxiv.org

We present VideoPoet, a language model capable of synthesizing high-quality video, with
matching audio, from a large variety of conditioning signals. VideoPoet employs a decoder …

被引用次数：109 相关文章所有 5 个版本

[PDF] arxiv.org

Pali-x: On scaling up a multilingual vision and language model

X Chen, J Djolonga, P Padlewski, B Mustafa… - arXiv preprint arXiv …, 2023 - arxiv.org

We present the training recipe and results of scaling up PaLI-X, a multilingual vision and
language model, both in terms of size of the components and the breadth of its training task …

被引用次数：142 相关文章所有 4 个版本

[PDF] thecvf.com

Facet: Fairness in computer vision evaluation benchmark

L Gustafson, C Rolland, N Ravi… - Proceedings of the …, 2023 - openaccess.thecvf.com

Computer vision models have known performance disparities across attributes such as
gender and skin tone. This means during tasks such as classification and detection, model …

被引用次数：27 相关文章所有 6 个版本

[PDF] arxiv.org

Pali-3 vision language models: Smaller, faster, stronger

X Chen, X Wang, L Beyer, A Kolesnikov, J Wu… - arXiv preprint arXiv …, 2023 - arxiv.org

This paper presents PaLI-3, a smaller, faster, and stronger vision language model (VLM) that
compares favorably to similar models that are 10x larger. As part of arriving at this strong …

被引用次数：58 相关文章所有 3 个版本

[PDF] arxiv.org

Survey of social bias in vision-language models

N Lee, Y Bang, H Lovenia, S Cahyawijaya… - arXiv preprint arXiv …, 2023 - arxiv.org

In recent years, the rapid advancement of machine learning (ML) models, particularly
transformer-based pre-trained models, has revolutionized Natural Language Processing …

被引用次数：7 相关文章所有 2 个版本

[PDF] openreview.net

The dollar street dataset: Images representing the geographic and socioeconomic diversity of the world

WAG Rojas, S Diamos, KR Kini, D Kanter… - … -sixth Conference on …, 2022 - openreview.net

It is crucial that image datasets for computer vision are representative and contain accurate
demographic information to ensure their robustness and fairness, especially for smaller …

被引用次数：62 相关文章所有 2 个版本

高级搜索

QQ 群