Image-text surgery: Efficient concept learning in image captioning by generating pseudopairs

D Xu, Y Shi, IW Tsang, YS Ong… - IEEE transactions on …, 2019 - ieeexplore.ieee.org

The aim of multi-output learning is to simultaneously predict multiple outputs given an input.
It is an important learning problem for decision-making since making decisions in the real …

被引用次数：303 相关文章所有 8 个版本

[PDF] aber.ac.uk

Region-object relation-aware dense captioning via transformer

Z Shao, J Han, D Marnerides… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org

Dense captioning provides detailed captions of complex visual scenes. While a number of
successes have been achieved in recent years, there are still two broad limitations: 1) most …

被引用次数：136 相关文章所有 9 个版本

Long-term video question answering via multimodal hierarchical memory attentive networks

T Yu, J Yu, Z Yu, Q Huang… - IEEE Transactions on …, 2020 - ieeexplore.ieee.org

Long-term Video Question Answering plays an essential role in visual information retrieval,
which aims at generating natural language answers to discretionary free-form questions …

被引用次数：58 相关文章所有 2 个版本

[PDF] arxiv.org

Fine-grained visual–text prompt-driven self-training for open-vocabulary object detection

Y Long, J Han, R Huang, H Xu, Y Zhu… - … on Neural Networks …, 2023 - ieeexplore.ieee.org

Inspired by the success of vision–language methods (VLMs) in zero-shot classification,
recent works attempt to extend this line of work into object detection by leveraging the …

被引用次数：21 相关文章所有 5 个版本

Compositional attention networks with two-stream fusion for video question answering

T Yu, J Yu, Z Yu, D Tao - IEEE Transactions on Image …, 2019 - ieeexplore.ieee.org

Given a video, Video Question Answering (VideoQA) aims at answering arbitrary free-form
questions about the video content in natural language. A successful VideoQA framework …

被引用次数：50 相关文章所有 4 个版本

[PDF] arxiv.org

End-to-end supermask pruning: Learning to prune image captioning models

JH Tan, CS Chan, JH Chuah - Pattern Recognition, 2022 - Elsevier

With the advancement of deep models, research work on image captioning has led to a
remarkable gain in raw performance over the last decade, along with increasing model …

被引用次数：19 相关文章所有 8 个版本

Evolution of automatic visual description techniques-a methodological survey

A Bhowmik, S Kumar, N Bhat - Multimedia Tools and Applications, 2021 - Springer

Describing the contents and activities in an image or video in semantically and syntactically
correct sentences are known as captioning. Automated captioning is one of the most …

被引用次数：23 相关文章所有 5 个版本

Semantic-Aware Dynamic Generation Networks for Few-Shot Human–Object Interaction Recognition

Z Ji, P An, X Liu, C Gao, Y Pang… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org

Recognizing human–object interaction (HOI) aims at inferring various relationships between
actions and objects. Although great progress in HOI has been made, the long-tail problem …

被引用次数：4 相关文章所有 3 个版本

Image captioning using hybrid LSTM-RNN with deep features

KP Deorukhkar, S Ket - Sensing and Imaging, 2022 - Springer

Automated image captioning is the process of creating textual, human-like subtitles or
explanations for photos based on their content. Throughout the image captioning problem …

被引用次数：12 相关文章所有 2 个版本

A comprehensive survey on deep-learning-based visual captioning

B Xin, N Xu, Y Zhai, T Zhang, Z Lu, J Liu, W Nie, X Li… - Multimedia …, 2023 - Springer

Generating a description for an image/video is termed as the visual captioning task. It
requires the model to capture the semantic information of visual content and translate them …

被引用次数：1 相关文章所有 2 个版本

高级搜索

QQ 群