Deep image captioning: A review of methods, trends and future challenges

L Xu, Q Tang, J Lv, B Zheng, X Zeng, W Li - Neurocomputing, 2023 - Elsevier
Image captioning, also called report generation in medical field, aims to describe visual
content of images in human language, which requires to model semantic relationship …

[HTML][HTML] A review of ensemble learning and data augmentation models for class imbalanced problems: Combination, implementation and evaluation

AA Khan, O Chaudhari, R Chandra - Expert Systems with Applications, 2023 - Elsevier
Class imbalance (CI) in classification problems arises when the number of observations
belonging to one class is lower than the other. Ensemble learning combines multiple models …

Toward verifiable and reproducible human evaluation for text-to-image generation

M Otani, R Togashi, Y Sawai… - Proceedings of the …, 2023 - openaccess.thecvf.com
Human evaluation is critical for validating the performance of text-to-image generative
models, as this highly cognitive process requires deep comprehension of text and images …

Frido: Feature pyramid diffusion for complex scene image synthesis

WC Fan, YC Chen, DD Chen, Y Cheng… - Proceedings of the …, 2023 - ojs.aaai.org
Diffusion models (DMs) have shown great potential for high-quality image synthesis.
However, when it comes to producing images with complex scenes, how to properly …

Textdiffuser: Diffusion models as text painters

J Chen, Y Huang, T Lv, L Cui… - Advances in Neural …, 2024 - proceedings.neurips.cc
Diffusion models have gained increasing attention for their impressive generation abilities
but currently struggle with rendering accurate and coherent text. To address this issue, we …

Taming encoder for zero fine-tuning image customization with text-to-image diffusion models

X Jia, Y Zhao, KCK Chan, Y Li, H Zhang… - arXiv preprint arXiv …, 2023 - arxiv.org
This paper proposes a method for generating images of customized objects specified by
users. The method is based on a general framework that bypasses the lengthy optimization …

Scenecomposer: Any-level semantic image synthesis

Y Zeng, Z Lin, J Zhang, Q Liu… - Proceedings of the …, 2023 - openaccess.thecvf.com
We propose a new framework for conditional image synthesis from semantic layouts of any
precision levels, ranging from pure text to a 2D semantic canvas with precise shapes. More …

Shape-aware text-driven layered video editing

YC Lee, JZG Jang, YT Chen, E Qiu… - Proceedings of the …, 2023 - openaccess.thecvf.com
Temporal consistency is essential for video editing applications. Existing work on layered
representation of videos allows propagating edits consistently to each frame. These …

Text-IF: Leveraging Semantic Text Guidance for Degradation-Aware and Interactive Image Fusion

X Yi, H Xu, H Zhang, L Tang… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
Image fusion aims to combine information from different source images to create a
comprehensively representative image. Existing fusion methods are typically helpless in …

A comprehensive survey on generative adversarial networks used for synthesizing multimedia content

L Kumar, DK Singh - Multimedia Tools and Applications, 2023 - Springer
GAN's are playing an important role in creating and generating a new set of data from the
previously available content. GAN models are impressive in the results for image and video …