Understanding guided image captioning performance across domains

Y Ming, Z Cai, J Gu, Y Sun, W Li… - Advances in neural …, 2022 - proceedings.neurips.cc

Recognizing out-of-distribution (OOD) samples is critical for machine learning systems
deployed in the open world. The vast majority of OOD detection methods are driven by a …

被引用次数：104 相关文章所有 6 个版本

[PDF] neurips.cc

Imagebart: Bidirectional context with multinomial diffusion for autoregressive image synthesis

P Esser, R Rombach, A Blattmann… - Advances in neural …, 2021 - proceedings.neurips.cc

Autoregressive models and their sequential factorization of the data likelihood have recently
demonstrated great potential for image representation and synthesis. Nevertheless, they …

被引用次数：136 相关文章所有 8 个版本

[PDF] arxiv.org

Tech: Text-guided reconstruction of lifelike clothed humans

Y Huang, H Yi, Y Xiu, T Liao, J Tang… - … Conference on 3D …, 2024 - ieeexplore.ieee.org

Despite recent research advancements in reconstructing clothed humans from a single
image, accurately restoring the “unseen regions” with high-level details remains an …

被引用次数：46 相关文章所有 5 个版本

[PDF] arxiv.org

Ernie-vil 2.0: Multi-view contrastive learning for image-text pre-training

B Shan, W Yin, Y Sun, H Tian, H Wu… - arXiv preprint arXiv …, 2022 - arxiv.org

Recent Vision-Language Pre-trained (VLP) models based on dual encoder have attracted
extensive attention from academia and industry due to their superior performance on various …

被引用次数：12 相关文章所有 2 个版本

[PDF] tuni.fi

Diversity and bias in audio captioning datasets

I Martin Morato, A Mesaros - 2021 - trepo.tuni.fi

Describing soundscapes in sentences allows better understanding of the acoustic scene
than a single label indicating the acoustic scene class or a set of audio tags indicating the …

被引用次数：30 相关文章所有 5 个版本

[PDF] mdpi.com

Dual-modal transformer with enhanced inter-and intra-modality interactions for image captioning

D Kumar, V Srivastava, DE Popescu, JD Hemanth - Applied Sciences, 2022 - mdpi.com

Image captioning is oriented towards describing an image with the best possible use of
words that can provide a semantic, relatable meaning of the scenario inscribed. Different …

被引用次数：9 相关文章所有 5 个版本

[PDF] arxiv.org

JaSPICE: Automatic Evaluation Metric Using Predicate-Argument Structures for Image Captioning Models

Y Wada, K Kaneda, K Sugiura - arXiv preprint arXiv:2311.04192, 2023 - arxiv.org

Image captioning studies heavily rely on automatic evaluation metrics such as BLEU and
METEOR. However, such n-gram-based metrics have been shown to correlate poorly with …

被引用次数：2 相关文章所有 5 个版本

[PDF] kaust.edu.sa

Boosting generic visual-linguistic representation with dynamic contexts

G Ma, Y Bai, W Zhang, T Yao… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org

Pretraining large models on generous multi-modal corpora has accelerated the
development of visual-linguistic (VL) representation and achieved great success on various …

被引用次数：3 相关文章所有 3 个版本

[PDF] arxiv.org

COSMic: a coherence-aware generation metric for image descriptions

M Inan, P Sharma, B Khalid, R Soricut, M Stone… - arXiv preprint arXiv …, 2021 - arxiv.org

Developers of text generation models rely on automated evaluation metrics as a stand-in for
slow and expensive manual evaluations. However, image captioning metrics have struggled …

被引用次数：9 相关文章所有 6 个版本

[PDF] arxiv.org

Describing image focused in cognitive and visual details for visually impaired people: An approach to generating inclusive paragraphs

DL Fernandes, MHF Ribeiro, FR Cerqueira… - arXiv preprint arXiv …, 2022 - arxiv.org

Several services for people with visual disabilities have emerged recently due to
achievements in Assistive Technologies and Artificial Intelligence areas. Despite the growth …

被引用次数：6 相关文章所有 5 个版本

高级搜索

QQ 群