Connecting vision and language with localized narratives

J Pont-Tuset, J Uijlings, S Changpinyo… - Computer Vision–ECCV …, 2020 - Springer
Abstract We propose Localized Narratives, a new form of multimodal image annotations
connecting vision and language. We ask annotators to describe an image with their voice …

Connecting Vision and Language with Localized Narratives

J Pont-Tuset, J Uijlings, S Changpinyo… - arXiv e …, 2019 - ui.adsabs.harvard.edu
Abstract We propose Localized Narratives, a new form of multimodal image annotations
connecting vision and language. We ask annotators to describe an image with their voice …

Connecting Vision and Language with Localized Narratives

J Pont-Tuset, J Uijlings, B Changpinyo, R Soricut… - research.google
Abstract We propose Localized Narratives, a new form of multimodal image annotations
connecting vision and language. We ask annotators to describe an image with their voice …

Connecting Vision and Language with Localized Narratives

J Pont-Tuset, J Uijlings, S Changpinyo… - … on Computer Vision, 2020 - dl.acm.org
Abstract We propose Localized Narratives, a new form of multimodal image annotations
connecting vision and language. We ask annotators to describe an image with their voice …

[PDF][PDF] Connecting Vision and Language with Localized Narratives

J Pont-Tuset, J Uijlings, S Changpinyo, R Soricut… - ecva.net
We propose Localized Narratives, a new form of multimodal image annotations connecting
vision and language. We ask annotators to describe an image with their voice while …

Connecting Vision and Language with Localized Narratives

J Pont-Tuset, J Uijlings, S Changpinyo… - arXiv preprint arXiv …, 2019 - arxiv.org
We propose Localized Narratives, a new form of multimodal image annotations connecting
vision and language. We ask annotators to describe an image with their voice while …

Connecting Vision and Language with Localized Narratives

J Pont-Tuset, J Uijlings, B Changpinyo, R Soricut… - research.google
Abstract We propose Localized Narratives, a new form of multimodal image annotations
connecting vision and language. We ask annotators to describe an image with their voice …