From show to tell: A survey on deep learning-based image captioning

A comprehensive survey of ai-generated content (aigc): A history of generative ai from gan to chatgpt

Y Cao, S Li, Y Liu, Z Yan, Y Dai, PS Yu… - arXiv preprint arXiv …, 2023 - arxiv.org

Recently, ChatGPT, along with DALL-E-2 and Codex, has been gaining significant attention
from society. As a result, many individuals have become interested in related resources and …

被引用次数：518 相关文章所有 2 个版本

[PDF] arxiv.org

A complete survey on generative ai (aigc): Is chatgpt from gpt-4 to gpt-5 all you need?

C Zhang, C Zhang, S Zheng, Y Qiao, C Li… - arXiv preprint arXiv …, 2023 - arxiv.org

As ChatGPT goes viral, generative AI (AIGC, aka AI-generated content) has made headlines
everywhere because of its ability to analyze and create text, images, and beyond. With such …

被引用次数：159 相关文章所有 4 个版本

[PDF] thecvf.com

Null-text inversion for editing real images using guided diffusion models

R Mokady, A Hertz, K Aberman… - Proceedings of the …, 2023 - openaccess.thecvf.com

Recent large-scale text-guided diffusion models provide powerful image generation
capabilities. Currently, a massive effort is given to enable the modification of these images …

被引用次数：494 相关文章所有 5 个版本

[PDF] mlr.press

Guiding pretraining in reinforcement learning with large language models

Y Du, O Watkins, Z Wang, C Colas… - International …, 2023 - proceedings.mlr.press

Reinforcement learning algorithms typically struggle in the absence of a dense, well-shaped
reward function. Intrinsically motivated exploration methods address this limitation by …

被引用次数：116 相关文章所有 7 个版本

[PDF] arxiv.org

Clipcap: Clip prefix for image captioning

R Mokady, A Hertz, AH Bermano - arXiv preprint arXiv:2111.09734, 2021 - arxiv.org

Image captioning is a fundamental task in vision-language understanding, where the model
predicts a textual informative caption to a given input image. In this paper, we present a …

被引用次数：583 相关文章所有 2 个版本

[PDF] thecvf.com

4d-fy: Text-to-4d generation using hybrid score distillation sampling

S Bahmani, I Skorokhodov, V Rong… - Proceedings of the …, 2024 - openaccess.thecvf.com

Recent breakthroughs in text-to-4D generation rely on pre-trained text-to-image and text-to-
video models to generate dynamic 3D scenes. However current text-to-4D methods face a …

被引用次数：38 相关文章所有 7 个版本

[PDF] arxiv.org

Translation between molecules and natural language

C Edwards, T Lai, K Ros, G Honke, K Cho… - arXiv preprint arXiv …, 2022 - arxiv.org

We present $\textbf {MolT5} $$-$ a self-supervised learning framework for pretraining
models on a vast amount of unlabeled natural language text and molecule strings. $\textbf …

被引用次数：118 相关文章所有 8 个版本

[PDF] neurips.cc

Quality not quantity: On the interaction between dataset design and robustness of clip

T Nguyen, G Ilharco, M Wortsman… - Advances in Neural …, 2022 - proceedings.neurips.cc

Web-crawled datasets have enabled remarkable generalization capabilities in recent image-
text models such as CLIP (Contrastive Language-Image pre-training) or Flamingo, but little …

被引用次数：66 相关文章所有 7 个版本

[PDF] arxiv.org

Advances in medical image analysis with vision transformers: a comprehensive review

R Azad, A Kazerouni, M Heidari, EK Aghdam… - Medical Image …, 2023 - Elsevier

The remarkable performance of the Transformer architecture in natural language processing
has recently also triggered broad interest in Computer Vision. Among other merits …

被引用次数：60 相关文章所有 7 个版本

[PDF] springer.com

Deep learning: Systematic review, models, challenges, and research directions

T Talaei Khoei, H Ould Slimane… - Neural Computing and …, 2023 - Springer

The current development in deep learning is witnessing an exponential transition into
automation applications. This automation transition can provide a promising framework for …

被引用次数：38 相关文章所有 4 个版本

高级搜索

QQ 群