Measuring social biases in grounded vision and language embeddings

PP Liang, A Zadeh, LP Morency - arXiv preprint arXiv:2209.03430, 2022 - arxiv.org

Multimodal machine learning is a vibrant multi-disciplinary research field that aims to design
computer agents with intelligent capabilities such as understanding, reasoning, and learning …

被引用次数：124 相关文章所有 2 个版本

[PDF] acm.org

Foundations & trends in multimodal machine learning: Principles, challenges, and open questions

PP Liang, A Zadeh, LP Morency - ACM Computing Surveys, 2024 - dl.acm.org

Multimodal machine learning is a vibrant multi-disciplinary research field that aims to design
computer agents with intelligent capabilities such as understanding, reasoning, and learning …

被引用次数：9 相关文章

[PDF] arxiv.org

Easily accessible text-to-image generation amplifies demographic stereotypes at large scale

F Bianchi, P Kalluri, E Durmus, F Ladhak… - Proceedings of the …, 2023 - dl.acm.org

Machine learning models that convert user-written text descriptions into images are now
widely available online and used by millions of users to generate millions of images a day …

被引用次数：178 相关文章所有 4 个版本

[HTML] sciencedirect.com

[HTML][HTML] Cpt: Colorful prompt tuning for pre-trained vision-language models

Y Yao, A Zhang, Z Zhang, Z Liu, TS Chua, M Sun - AI Open, 2024 - Elsevier

Abstract Vision-Language Pre-training (VLP) models have shown promising capabilities in
grounding natural language in image data, facilitating a broad range of cross-modal tasks …

被引用次数：215 相关文章所有 4 个版本

[PDF] arxiv.org

Language (technology) is power: A critical survey of" bias" in nlp

SL Blodgett, S Barocas, H Daumé III… - arXiv preprint arXiv …, 2020 - arxiv.org

We survey 146 papers analyzing" bias" in NLP systems, finding that their motivations are
often vague, inconsistent, and lacking in normative reasoning, despite the fact that …

被引用次数：1096 相关文章所有 5 个版本

[PDF] arxiv.org

Assessing cross-cultural alignment between ChatGPT and human societies: An empirical study

Y Cao, L Zhou, S Lee, L Cabello, M Chen… - arXiv preprint arXiv …, 2023 - arxiv.org

The recent release of ChatGPT has garnered widespread recognition for its exceptional
ability to generate human-like responses in dialogue. Given its usage by users from various …

被引用次数：90 相关文章所有 5 个版本

[PDF] arxiv.org

Cm3: A causal masked multimodal model of the internet

A Aghajanyan, B Huang, C Ross, V Karpukhin… - arXiv preprint arXiv …, 2022 - arxiv.org

We introduce CM3, a family of causally masked generative models trained over a large
corpus of structured multi-modal documents that can contain both text and image tokens …

被引用次数：132 相关文章所有 2 个版本

[PDF] thecvf.com

Dall-eval: Probing the reasoning skills and social biases of text-to-image generation models

J Cho, A Zala, M Bansal - Proceedings of the IEEE/CVF …, 2023 - openaccess.thecvf.com

Recently, DALL-E, a multimodal transformer language model, and its variants including
diffusion models have shown high-quality text-to-image generation capabilities. However …

被引用次数：82 相关文章所有 5 个版本

[PDF] arxiv.org

Debiased contrastive learning of unsupervised sentence representations

K Zhou, B Zhang, WX Zhao, JR Wen - arXiv preprint arXiv:2205.00656, 2022 - arxiv.org

Recently, contrastive learning has been shown to be effective in improving pre-trained
language models (PLM) to derive high-quality sentence representations. It aims to pull close …

被引用次数：89 相关文章所有 4 个版本

[HTML] nih.gov

[HTML][HTML] Multibench: Multiscale benchmarks for multimodal representation learning

PP Liang, Y Lyu, X Fan, Z Wu, Y Cheng… - Advances in neural …, 2021 - ncbi.nlm.nih.gov

Learning multimodal representations involves integrating information from multiple
heterogeneous sources of data. It is a challenging yet crucial area with numerous real-world …

被引用次数：129 相关文章所有 9 个版本

高级搜索

QQ 群