Aligning where to see and what to tell: Image captioning with region-based attention and...

D Xu, Y Shi, IW Tsang, YS Ong… - IEEE transactions on …, 2019 - ieeexplore.ieee.org

The aim of multi-output learning is to simultaneously predict multiple outputs given an input.
It is an important learning problem for decision-making since making decisions in the real …

被引用次数：305 相关文章所有 8 个版本

[PDF] ieee.org

Automatic image and video caption generation with deep learning: A concise review and algorithmic overlap

S Amirian, K Rasheed, TR Taha, HR Arabnia - IEEE access, 2020 - ieeexplore.ieee.org

Methodologies that utilize Deep Learning offer great potential for applications that
automatically attempt to generate captions or descriptions about images and video frames …

被引用次数：81 相关文章所有 5 个版本

[PDF] thecvf.com

Exploring visual relationship for image captioning

T Yao, Y Pan, Y Li, T Mei - Proceedings of the European …, 2018 - openaccess.thecvf.com

It is always well believed that modeling relationships between objects would be helpful for
representing and eventually describing an image. Nevertheless, there has not been …

被引用次数：1076 相关文章所有 9 个版本

[HTML] nih.gov

Survey on deep neural networks in speech and vision systems

M Alam, MD Samad, L Vidyaratne, A Glandon… - Neurocomputing, 2020 - Elsevier

This survey presents a review of state-of-the-art deep neural network architectures,
algorithms, and systems in speech and vision applications. Recent advances in deep …

被引用次数：275 相关文章所有 8 个版本

[PDF] researchgate.net

A survey on automatic image caption generation

S Bai, S An - Neurocomputing, 2018 - Elsevier

Image captioning means automatically generating a caption for an image. As a recently
emerged research area, it is attracting more and more attention. To achieve the goal of …

被引用次数：290 相关文章所有 4 个版本

Deep multi-path convolutional neural network joint with salient region attention for facial expression recognition

S Xie, H Hu, Y Wu - Pattern recognition, 2019 - Elsevier

Abstract Facial Expression Recognition (FER) has long been a challenging task in the field
of computer vision. In this paper, we present a novel model, named Deep Attentive Multi …

被引用次数：224 相关文章所有 5 个版本

[PDF] cognn.com

Learning visual relationship and context-aware attention for image captioning

J Wang, W Wang, L Wang, Z Wang, DD Feng, T Tan - Pattern Recognition, 2020 - Elsevier

Image captioning which automatically generates natural language descriptions for images
has attracted lots of research attentions and there have been substantial progresses with …

被引用次数：141 相关文章所有 5 个版本

Incorporating attentive multi-scale context information for image captioning

J Prudviraj, Y Sravani, CK Mohan - Multimedia Tools and Applications, 2023 - Springer

In this paper, we propose a novel encoding framework to learn the multi-scale context
information of the visual scene for image captioning task. The devised multi-scale context …

被引用次数：51 相关文章所有 4 个版本

[PDF] thecvf.com

Robust change captioning

DH Park, T Darrell, A Rohrbach - Proceedings of the IEEE …, 2019 - openaccess.thecvf.com

Describing what has changed in a scene can be useful to a user, but only if generated text
focuses on what is semantically relevant. It is thus important to distinguish distractors (eg a …

被引用次数：170 相关文章所有 4 个版本

[PDF] arxiv.org

Changes to captions: An attentive network for remote sensing change captioning

S Chang, P Ghamisi - IEEE Transactions on Image Processing, 2023 - ieeexplore.ieee.org

In recent years, advanced research has focused on the direct learning and analysis of
remote-sensing images using natural language processing (NLP) techniques. The ability to …

被引用次数：45 相关文章所有 6 个版本

高级搜索

QQ 群