所有版本 - 学术资源搜索

Grounding language models to images for multimodal inputs and outputs

JY Koh, R Salakhutdinov… - … Conference on Machine …, 2023 - proceedings.mlr.press

We propose an efficient method to ground pretrained text-only language models to the
visual domain, enabling them to process arbitrarily interleaved image-and-text data, and …

被引用次数：132 相关文章

Grounding Language Models to Images for Multimodal Inputs and Outputs

JY Koh, R Salakhutdinov, D Fried - arXiv preprint arXiv:2301.13823, 2023 - arxiv.org

We propose an efficient method to ground pretrained text-only language models to the
visual domain, enabling them to process arbitrarily interleaved image-and-text data, and …

[PDF] jykoh.com

[PDF][PDF] Grounding Language Models to Images for Multimodal Inputs and Outputs

JY Koh - 2023 - jykoh.com

[London ML Meetup] Grounding Language Models for Contextual Multi-Modal Generation
Page 1 Grounding Language Models to Images for Multimodal Inputs and Outputs Jing Yu …

Grounding Language Models to Images for Multimodal Inputs and Outputs

JY Koh, R Salakhutdinov, D Fried - openreview.net

We propose an efficient method to ground pretrained text-only language models to the
visual domain, enabling them to process arbitrarily interleaved image-and-text data, and …

Grounding language models to images for multimodal inputs and outputs

JY Koh, R Salakhutdinov, D Fried - Proceedings of the 40th International …, 2023 - dl.acm.org

We propose an efficient method to ground pretrained text-only language models to the
visual domain, enabling them to process arbitrarily interleaved image-and-text data, and …

[PDF] mlr.press

[PDF][PDF] Grounding Language Models to Images for Multimodal Inputs and Outputs

JY Koh, R Salakhutdinov, D Fried - proceedings.mlr.press

We propose an efficient method to ground pretrained text-only language models to the
visual domain, enabling them to process arbitrarily interleaved image-and-text data, and …

Grounding Language Models to Images for Multimodal Inputs and Outputs

JY Koh, R Salakhutdinov, D Fried - arXiv e-prints, 2023 - ui.adsabs.harvard.edu

We propose an efficient method to ground pretrained text-only language models to the
visual domain, enabling them to process arbitrarily interleaved image-and-text data, and …

高级搜索

QQ 群

Grounding language models to images for multimodal inputs and outputs

Grounding Language Models to Images for Multimodal Inputs and Outputs

[PDF][PDF] Grounding Language Models to Images for Multimodal Inputs and Outputs

Grounding Language Models to Images for Multimodal Inputs and Outputs

Grounding language models to images for multimodal inputs and outputs

[PDF][PDF] Grounding Language Models to Images for Multimodal Inputs and Outputs

Grounding Language Models to Images for Multimodal Inputs and Outputs

引用