Improved image captioning via policy gradient optimization of spider

M Stefanini, M Cornia, L Baraldi… - IEEE transactions on …, 2022 - ieeexplore.ieee.org

Connecting Vision and Language plays an essential role in Generative Intelligence. For this
reason, large research efforts have been devoted to image captioning, ie describing images …

被引用次数：298 相关文章所有 11 个版本

[PDF] arxiv.org

A survey of natural language generation

C Dong, Y Li, H Gong, M Chen, J Li, Y Shen… - ACM Computing …, 2022 - dl.acm.org

This article offers a comprehensive review of the research on Natural Language Generation
(NLG) over the past two decades, especially in relation to data-to-text generation and text-to …

被引用次数：176 相关文章所有 4 个版本

[PDF] thecvf.com

Meshed-memory transformer for image captioning

M Cornia, M Stefanini, L Baraldi… - Proceedings of the …, 2020 - openaccess.thecvf.com

Transformer-based architectures represent the state of the art in sequence modeling tasks
like machine translation and language understanding. Their applicability to multi-modal …

被引用次数：1026 相关文章所有 13 个版本

[PDF] arxiv.org

Wavcaps: A chatgpt-assisted weakly-labelled audio captioning dataset for audio-language multimodal research

X Mei, C Meng, H Liu, Q Kong, T Ko… - … on Audio, Speech …, 2024 - ieeexplore.ieee.org

The advancement of audio-language (AL) multimodal learning tasks has been significant in
recent years, yet the limited size of existing audio-language datasets poses challenges for …

被引用次数：78 相关文章所有 3 个版本

[PDF] arxiv.org

A survey of evaluation metrics used for NLG systems

AB Sai, AK Mohankumar, MM Khapra - ACM Computing Surveys (CSUR …, 2022 - dl.acm.org

In the last few years, a large number of automatic evaluation metrics have been proposed for
evaluating Natural Language Generation (NLG) systems. The rapid development and …

被引用次数：225 相关文章所有 4 个版本

[PDF] arxiv.org

A comprehensive survey of deep learning for image captioning

MDZ Hossain, F Sohel, MF Shiratuddin… - ACM Computing Surveys …, 2019 - dl.acm.org

Generating a description of an image is called image captioning. Image captioning requires
recognizing the important objects, their attributes, and their relationships in an image. It also …

被引用次数：885 相关文章所有 8 个版本

[PDF] thecvf.com

Visualgpt: Data-efficient adaptation of pretrained language models for image captioning

J Chen, H Guo, K Yi, B Li… - Proceedings of the IEEE …, 2022 - openaccess.thecvf.com

The limited availability of annotated data often hinders real-world applications of machine
learning. To efficiently learn from small quantities of multimodal data, we leverage the …

被引用次数：164 相关文章所有 12 个版本

[PDF] thecvf.com

Image generation from scene graphs

J Johnson, A Gupta, L Fei-Fei - Proceedings of the IEEE …, 2018 - openaccess.thecvf.com

To truly understand the visual world our models should be able not only to recognize images
but also generate them. To this end, there has been exciting recent progress on gen-erating …

被引用次数：894 相关文章所有 10 个版本

[PDF] thecvf.com

Bottom-up and top-down attention for image captioning and visual question answering

P Anderson, X He, C Buehler… - Proceedings of the …, 2018 - openaccess.thecvf.com

Top-down visual attention mechanisms have been used extensively in image captioning
and visual question answering (VQA) to enable deeper image understanding through fine …

被引用次数：5106 相关文章所有 16 个版本

[PDF] arxiv.org

Deep reinforcement learning: An overview

Y Li - arXiv preprint arXiv:1701.07274, 2017 - arxiv.org

We give an overview of recent exciting achievements of deep reinforcement learning (RL).
We discuss six core elements, six important mechanisms, and twelve applications. We start …

被引用次数：1835 相关文章所有 6 个版本

高级搜索

QQ 群