Visually grounded concept composition

文章

学术资源搜索

获得 5 条结果（用时0.02秒）

我的图书馆

Visually grounded concept composition

在引用文章中搜索

[PDF] arxiv.org

SpaBERT: A pretrained language model from geographic data for geo-entity representation

Z Li, J Kim, YY Chiang, M Chen - arXiv preprint arXiv:2210.12213, 2022 - arxiv.org

Named geographic entities (geo-entities for short) are the building blocks of many
geographic datasets. Characterizing geo-entities is integral to various application domains …

被引用次数：21 相关文章所有 5 个版本

[PDF] arxiv.org

Maqa: A multimodal qa benchmark for negation

JY Li, A Jansen, Q Huang, J Lee, R Ganti… - arXiv preprint arXiv …, 2023 - arxiv.org

Multimodal learning can benefit from the representation power of pretrained Large
Language Models (LLMs). However, state-of-the-art transformer based LLMs often ignore …

被引用次数：6 相关文章所有 4 个版本

[PDF] mlr.press

Enriching unsupervised user embedding via medical concepts

X Huang, F Dernoncourt… - Conference on Health …, 2022 - proceedings.mlr.press

Abstract Clinical notes in Electronic Health Records (EHR) present rich documented
information of patients to inference phenotype for disease diagnosis and study patient …

被引用次数：3 相关文章所有 6 个版本

[PDF] aclanthology.org

Visual-Linguistic Dependency Encoding for Image-Text Retrieval

W Guo, L Zhang, K Zhang, Y Liu… - Proceedings of the 2024 …, 2024 - aclanthology.org

Image-text retrieval is a fundamental task to bridge the semantic gap between natural
language and vision. Recent works primarily focus on aligning textual meanings with visual …

[PDF] diva-portal.org

Learning, reasoning, and compositional generalisation in multimodal language models

A Dahlgren Lindström - 2024 - diva-portal.org

We humans learn language and how to interact with the world through our different senses,
grounding our language in what we can see, touch, hear, and smell. We call these streams …

高级搜索

QQ 群

Visually grounded concept composition

SpaBERT: A pretrained language model from geographic data for geo-entity representation

Maqa: A multimodal qa benchmark for negation

Enriching unsupervised user embedding via medical concepts

Visual-Linguistic Dependency Encoding for Image-Text Retrieval

Learning, reasoning, and compositional generalisation in multimodal language models

引用