SpaBERT: A pretrained language model from geographic data for geo-entity representation

Z Li, J Kim, YY Chiang, M Chen - arXiv preprint arXiv:2210.12213, 2022 - arxiv.org
Named geographic entities (geo-entities for short) are the building blocks of many
geographic datasets. Characterizing geo-entities is integral to various application domains …

Maqa: A multimodal qa benchmark for negation

JY Li, A Jansen, Q Huang, J Lee, R Ganti… - arXiv preprint arXiv …, 2023 - arxiv.org
Multimodal learning can benefit from the representation power of pretrained Large
Language Models (LLMs). However, state-of-the-art transformer based LLMs often ignore …

Enriching unsupervised user embedding via medical concepts

X Huang, F Dernoncourt… - Conference on Health …, 2022 - proceedings.mlr.press
Abstract Clinical notes in Electronic Health Records (EHR) present rich documented
information of patients to inference phenotype for disease diagnosis and study patient …

Visual-Linguistic Dependency Encoding for Image-Text Retrieval

W Guo, L Zhang, K Zhang, Y Liu… - Proceedings of the 2024 …, 2024 - aclanthology.org
Image-text retrieval is a fundamental task to bridge the semantic gap between natural
language and vision. Recent works primarily focus on aligning textual meanings with visual …

Learning, reasoning, and compositional generalisation in multimodal language models

A Dahlgren Lindström - 2024 - diva-portal.org
We humans learn language and how to interact with the world through our different senses,
grounding our language in what we can see, touch, hear, and smell. We call these streams …