Survey on multi-output learning

D Xu, Y Shi, IW Tsang, YS Ong… - IEEE transactions on …, 2019 - ieeexplore.ieee.org
The aim of multi-output learning is to simultaneously predict multiple outputs given an input.
It is an important learning problem for decision-making since making decisions in the real …

Region-object relation-aware dense captioning via transformer

Z Shao, J Han, D Marnerides… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
Dense captioning provides detailed captions of complex visual scenes. While a number of
successes have been achieved in recent years, there are still two broad limitations: 1) most …

Long-term video question answering via multimodal hierarchical memory attentive networks

T Yu, J Yu, Z Yu, Q Huang… - IEEE Transactions on …, 2020 - ieeexplore.ieee.org
Long-term Video Question Answering plays an essential role in visual information retrieval,
which aims at generating natural language answers to discretionary free-form questions …

Fine-grained visual–text prompt-driven self-training for open-vocabulary object detection

Y Long, J Han, R Huang, H Xu, Y Zhu… - … on Neural Networks …, 2023 - ieeexplore.ieee.org
Inspired by the success of vision–language methods (VLMs) in zero-shot classification,
recent works attempt to extend this line of work into object detection by leveraging the …

Compositional attention networks with two-stream fusion for video question answering

T Yu, J Yu, Z Yu, D Tao - IEEE Transactions on Image …, 2019 - ieeexplore.ieee.org
Given a video, Video Question Answering (VideoQA) aims at answering arbitrary free-form
questions about the video content in natural language. A successful VideoQA framework …

End-to-end supermask pruning: Learning to prune image captioning models

JH Tan, CS Chan, JH Chuah - Pattern Recognition, 2022 - Elsevier
With the advancement of deep models, research work on image captioning has led to a
remarkable gain in raw performance over the last decade, along with increasing model …

Evolution of automatic visual description techniques-a methodological survey

A Bhowmik, S Kumar, N Bhat - Multimedia Tools and Applications, 2021 - Springer
Describing the contents and activities in an image or video in semantically and syntactically
correct sentences are known as captioning. Automated captioning is one of the most …

Semantic-Aware Dynamic Generation Networks for Few-Shot Human–Object Interaction Recognition

Z Ji, P An, X Liu, C Gao, Y Pang… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Recognizing human–object interaction (HOI) aims at inferring various relationships between
actions and objects. Although great progress in HOI has been made, the long-tail problem …

Image captioning using hybrid LSTM-RNN with deep features

KP Deorukhkar, S Ket - Sensing and Imaging, 2022 - Springer
Automated image captioning is the process of creating textual, human-like subtitles or
explanations for photos based on their content. Throughout the image captioning problem …

A comprehensive survey on deep-learning-based visual captioning

B Xin, N Xu, Y Zhai, T Zhang, Z Lu, J Liu, W Nie, X Li… - Multimedia …, 2023 - Springer
Generating a description for an image/video is termed as the visual captioning task. It
requires the model to capture the semantic information of visual content and translate them …