Groupcap: Group-based image captioning with structured relevance and diversity constraints

M Stefanini, M Cornia, L Baraldi… - IEEE transactions on …, 2022 - ieeexplore.ieee.org

Connecting Vision and Language plays an essential role in Generative Intelligence. For this
reason, large research efforts have been devoted to image captioning, ie describing images …

被引用次数：330 相关文章所有 11 个版本

Deep image captioning: A review of methods, trends and future challenges

L Xu, Q Tang, J Lv, B Zheng, X Zeng, W Li - Neurocomputing, 2023 - Elsevier

Image captioning, also called report generation in medical field, aims to describe visual
content of images in human language, which requires to model semantic relationship …

被引用次数：25 相关文章所有 2 个版本

[PDF] thecvf.com

Mirrorgan: Learning text-to-image generation by redescription

T Qiao, J Zhang, D Xu, D Tao - Proceedings of the IEEE …, 2019 - openaccess.thecvf.com

Generating an image from a given text description has two goals: visual realism and
semantic consistency. Although significant progress has been made in generating high …

被引用次数：656 相关文章所有 9 个版本

Chinese image captioning via fuzzy attention-based DenseNet-BiLSTM

H Lu, R Yang, Z Deng, Y Zhang, G Gao… - ACM Transactions on …, 2021 - dl.acm.org

Chinese image description generation tasks usually have some challenges, such as single-
feature extraction, lack of global information, and lack of detailed description of the image …

被引用次数：124 相关文章

[PDF] thecvf.com

Beyond a pre-trained object detector: Cross-modal textual and visual context for image captioning

CW Kuo, Z Kira - Proceedings of the IEEE/CVF conference …, 2022 - openaccess.thecvf.com

Significant progress has been made on visual captioning, largely relying on pre-trained
features and later fixed object detectors that serve as rich inputs to auto-regressive models …

被引用次数：57 相关文章所有 5 个版本

[PDF] arxiv.org

Deconfounded image captioning: A causal retrospect

X Yang, H Zhang, J Cai - IEEE Transactions on Pattern …, 2021 - ieeexplore.ieee.org

Dataset bias in vision-language tasks is becoming one of the main problems which hinders
the progress of our community. Existing solutions lack a principled analysis about why …

被引用次数：142 相关文章所有 9 个版本

NWPU-captions dataset and MLCA-net for remote sensing image captioning

Q Cheng, H Huang, Y Xu, Y Zhou, H Li… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org

Recently, the burgeoning demands for captioning-related applications have inspired great
endeavors in the remote sensing community. However, current benchmark datasets are …

被引用次数：48 相关文章所有 2 个版本

[PDF] thecvf.com

Reasoning visual dialogs with structural and partial observations

Z Zheng, W Wang, S Qi, SC Zhu - Proceedings of the IEEE …, 2019 - openaccess.thecvf.com

We propose a novel model to address the task of Visual Dialog which exhibits complex
dialog structures. To obtain a reasonable answer based on the current question and the …

被引用次数：138 相关文章所有 9 个版本

Multi-level policy and reward-based deep reinforcement learning framework for image captioning

N Xu, H Zhang, AA Liu, W Nie, Y Su… - IEEE Transactions on …, 2019 - ieeexplore.ieee.org

Image captioning is one of the most challenging tasks in AI because it requires an
understanding of both complex visuals and natural language. Because image captioning is …

被引用次数：102 相关文章

[PDF] thecvf.com

Dense relational captioning: Triple-stream networks for relationship-based captioning

DJ Kim, J Choi, TH Oh… - Proceedings of the IEEE …, 2019 - openaccess.thecvf.com

Our goal in this work is to train an image captioning model that generates more dense and
informative captions. We introduce" relational captioning," a novel image captioning task …

被引用次数：105 相关文章所有 8 个版本

高级搜索

QQ 群