Scene graph generation from objects, phrases and region captions

Y Li, W Ouyang, B Zhou, K Wang… - Proceedings of the …, 2017 - openaccess.thecvf.com
Object detection, scene graph generation and region captioning, which are three scene
understanding tasks at different semantic levels, are tied together: scene graphs are …

Densecap: Fully convolutional localization networks for dense captioning

J Johnson, A Karpathy… - Proceedings of the IEEE …, 2016 - openaccess.thecvf.com
We introduce the dense captioning task, which requires a computer vision system to both
localize and describe salient regions in images in natural language. The dense captioning …

Chinese image captioning via fuzzy attention-based DenseNet-BiLSTM

H Lu, R Yang, Z Deng, Y Zhang, G Gao… - ACM Transactions on …, 2021 - dl.acm.org
Chinese image description generation tasks usually have some challenges, such as single-
feature extraction, lack of global information, and lack of detailed description of the image …

Generating sentences by editing prototypes

K Guu, TB Hashimoto, Y Oren, P Liang - Transactions of the …, 2018 - direct.mit.edu
We propose a new generative language model for sentences that first samples a prototype
sentence from the training corpus and then edits it into a new sentence. Compared to …

Guiding the long-short term memory model for image caption generation

X Jia, E Gavves, B Fernando… - Proceedings of the …, 2015 - openaccess.thecvf.com
In this work we focus on the problem of image caption generation. We propose an extension
of the long short term memory (LSTM) model, which we coin gLSTM for short. In particular …

Vip-cnn: Visual phrase guided convolutional neural network

Y Li, W Ouyang, X Wang… - Proceedings of the IEEE …, 2017 - openaccess.thecvf.com
As the intermediate level task connecting image captioning and object detection, visual
relationship detection started to catch researchers' attention because of its descriptive power …

Know more say less: Image captioning based on scene graphs

X Li, S Jiang - IEEE Transactions on Multimedia, 2019 - ieeexplore.ieee.org
Automatically describing the content of an image has been attracting considerable research
attention in the multimedia field. To represent the content of an image, many approaches …

Recent advances in neural text generation: A task-agnostic survey

C Tang, F Guerin, C Lin - arXiv preprint arXiv:2203.03047, 2022 - arxiv.org
In recent years, considerable research has been dedicated to the application of neural
models in the field of natural language generation (NLG). The primary objective is to …

A retrieve-and-edit framework for predicting structured outputs

TB Hashimoto, K Guu, Y Oren… - Advances in Neural …, 2018 - proceedings.neurips.cc
For the task of generating complex outputs such as source code, editing existing outputs can
be easier than generating complex outputs from scratch. With this motivation, we propose an …

Diverse and accurate image description using a variational auto-encoder with an additive gaussian encoding space

L Wang, A Schwing, S Lazebnik - Advances in Neural …, 2017 - proceedings.neurips.cc
This paper explores image caption generation using conditional variational auto-encoders
(CVAEs). Standard CVAEs with a fixed Gaussian prior yield descriptions with too little …