I2t: Image parsing to text description

M Stefanini, M Cornia, L Baraldi… - IEEE transactions on …, 2022 - ieeexplore.ieee.org

Connecting Vision and Language plays an essential role in Generative Intelligence. For this
reason, large research efforts have been devoted to image captioning, ie describing images …

被引用次数：295 相关文章所有 11 个版本

[PDF] arxiv.org

Multimodal machine learning: A survey and taxonomy

T Baltrušaitis, C Ahuja… - IEEE transactions on …, 2018 - ieeexplore.ieee.org

Our experience of the world is multimodal-we see objects, hear sounds, feel texture, smell
odors, and taste flavors. Modality refers to the way in which something happens or is …

被引用次数：3207 相关文章所有 12 个版本

[PDF] thecvf.com

Meshed-memory transformer for image captioning

M Cornia, M Stefanini, L Baraldi… - Proceedings of the …, 2020 - openaccess.thecvf.com

Transformer-based architectures represent the state of the art in sequence modeling tasks
like machine translation and language understanding. Their applicability to multi-modal …

被引用次数：1017 相关文章所有 13 个版本

[PDF] thecvf.com

Attention on attention for image captioning

L Huang, W Wang, J Chen… - Proceedings of the IEEE …, 2019 - openaccess.thecvf.com

Attention mechanisms are widely used in current encoder/decoder frameworks of image
captioning, where a weighted average on encoded vectors is generated at each time step to …

被引用次数：981 相关文章所有 9 个版本

[PDF] thecvf.com

Visualgpt: Data-efficient adaptation of pretrained language models for image captioning

J Chen, H Guo, K Yi, B Li… - Proceedings of the IEEE …, 2022 - openaccess.thecvf.com

The limited availability of annotated data often hinders real-world applications of machine
learning. To efficiently learn from small quantities of multimodal data, we leverage the …

被引用次数：158 相关文章所有 12 个版本

[PDF] jair.org

Survey of the state of the art in natural language generation: Core tasks, applications and evaluation

A Gatt, E Krahmer - Journal of Artificial Intelligence Research, 2018 - jair.org

This paper surveys the current state of the art in Natural Language Generation (NLG),
defined as the task of generating text or speech from non-linguistic input. A survey of NLG is …

被引用次数：1043 相关文章所有 15 个版本

[PDF] ieee.org

Show and tell: Lessons learned from the 2015 mscoco image captioning challenge

O Vinyals, A Toshev, S Bengio… - IEEE transactions on …, 2016 - ieeexplore.ieee.org

Automatically describing the content of an image is a fundamental problem in artificial
intelligence that connects computer vision and natural language processing. In this paper …

被引用次数：1085 相关文章所有 20 个版本

[PDF] thecvf.com

Flickr30k entities: Collecting region-to-phrase correspondences for richer image-to-sentence models

BA Plummer, L Wang, CM Cervantes… - Proceedings of the …, 2015 - openaccess.thecvf.com

The Flickr30k dataset has become a standard benchmark for sentence-based image
description. This paper presents Flickr30k Entities, which augments the 158k captions from …

被引用次数：1961 相关文章所有 29 个版本

[PDF] cv-foundation.org

Deep visual-semantic alignments for generating image descriptions

A Karpathy, L Fei-Fei - Proceedings of the IEEE conference on …, 2015 - cv-foundation.org

We present a model that generates natural language descriptions of images and their
regions. Our approach leverages datasets of images and their sentence descriptions to …

被引用次数：6907 相关文章所有 39 个版本

[PDF] cv-foundation.org

Show and tell: A neural image caption generator

O Vinyals, A Toshev, S Bengio… - Proceedings of the IEEE …, 2015 - cv-foundation.org

Automatically describing the content of an image is a fundamental problem in artificial
intelligence that connects computer vision and natural language processing. In this paper …

被引用次数：7537 相关文章所有 27 个版本

高级搜索

QQ 群