Dual attention on pyramid feature maps for image captioning

L Xu, Q Tang, J Lv, B Zheng, X Zeng, W Li - Neurocomputing, 2023 - Elsevier

Image captioning, also called report generation in medical field, aims to describe visual
content of images in human language, which requires to model semantic relationship …

被引用次数：17 相关文章所有 2 个版本

[PDF] arxiv.org

Dilateformer: Multi-scale dilated transformer for visual recognition

J Jiao, YM Tang, KY Lin, Y Gao, AJ Ma… - IEEE Transactions …, 2023 - ieeexplore.ieee.org

As a de facto solution, the vanilla Vision Transformers (ViTs) are encouraged to model long-
range dependencies between arbitrary image patches while the global attended receptive …

被引用次数：51 相关文章所有 4 个版本

[PDF] mdpi.com

Image forgery detection using deep learning by recompressing images

SS Ali, II Ganapathi, NS Vu, SD Ali, N Saxena… - Electronics, 2022 - mdpi.com

Capturing images has been increasingly popular in recent years, owing to the widespread
availability of cameras. Images are essential in our daily lives because they contain a wealth …

被引用次数：67 相关文章所有 7 个版本

[PDF] arxiv.org

Geometry Attention Transformer with position-aware LSTMs for image captioning

C Wang, Y Shen, L Ji - Expert systems with applications, 2022 - Elsevier

In recent years, Transformer structures have been widely applied in image captioning with
impressive performance. However, previous works often neglect the geometry and position …

被引用次数：42 相关文章所有 4 个版本

[PDF] arxiv.org

Hierarchical local-global transformer for temporal sentence grounding

X Fang, D Liu, P Zhou, Z Xu, R Li - IEEE Transactions on …, 2023 - ieeexplore.ieee.org

This article studies the multimedia problem of temporal sentence grounding (TSG), which
aims to accurately determine the specific video segment in an untrimmed video according to …

被引用次数：25 相关文章所有 4 个版本

Semi-supervised medical report generation via graph-guided hybrid feature consistency

K Zhang, H Jiang, J Zhang, Q Huang… - IEEE Transactions …, 2023 - ieeexplore.ieee.org

Medical report generation generates the corresponding report according to the given
radiology image, which has been attracting increasing research interest. However, existing …

被引用次数：13 相关文章所有 2 个版本

Visual cluster grounding for image captioning

W Jiang, M Zhu, Y Fang, G Shi… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org

Attention mechanisms have been extensively adopted in vision and language tasks such as
image captioning. It encourages a captioning model to dynamically ground appropriate …

被引用次数：24 相关文章所有 5 个版本

[PDF] springer.com

Image caption generation using visual attention prediction and contextual spatial relation extraction

R Sasibhooshan, S Kumaraswamy, S Sasidharan - Journal of Big Data, 2023 - Springer

Automatic caption generation with attention mechanisms aims at generating more
descriptive captions containing coarser to finer semantic contents in the image. In this work …

被引用次数：18 相关文章所有 7 个版本

Transformer-based local-global guidance for image captioning

H Parvin, AR Naghsh-Nilchi, HM Mohammadi - Expert Systems with …, 2023 - Elsevier

Image captioning is a difficult problem for machine learning algorithms to compress huge
amounts of images into descriptive languages. The recurrent models are popularly used as …

被引用次数：12 相关文章所有 2 个版本

[PDF] arxiv.org

Hgan: Hierarchical graph alignment network for image-text retrieval

J Guo, M Wang, Y Zhou, B Song, Y Chi… - IEEE Transactions …, 2023 - ieeexplore.ieee.org

Image-text retrieval (ITR) is a challenging task in the field of multimodal information
processing due to the semantic gap between different modalities. In recent years …

被引用次数：14 相关文章所有 4 个版本

高级搜索

QQ 群