Scalable and accurate self-supervised multimodal representation learning without aligned...

Deep learning and knowledge graph for image/video captioning: A review of datasets, evaluation metrics, and methods

MS Wajid, H Terashima‐Marin, P Najafirad… - Engineering …, 2024 - Wiley Online Library

Generating an image/video caption has always been a fundamental problem of Artificial
Intelligence, which is usually performed using the potential of Deep Learning Methods …

被引用次数：16 相关文章所有 2 个版本

[PDF] arxiv.org

Howtocaption: Prompting llms to transform video annotations at scale

N Shvetsova, A Kukleva, X Hong, C Rupprecht… - … on Computer Vision, 2025 - Springer

Instructional videos are a common source for learning text-video or even multimodal
representations by leveraging subtitles extracted with automatic speech recognition systems …

被引用次数：18 相关文章所有 3 个版本

[PDF] arxiv.org

MOSAIC: Multimodal Multistakeholder-aware Visual Art Recommendation

BA Yilma, LA Leiva - arXiv preprint arXiv:2407.21758, 2024 - arxiv.org

Visual art (VA) recommendation is complex, as it has to consider the interests of users (eg
museum visitors) and other stakeholders (eg museum curators). We study how to effectively …

被引用次数：1 相关文章所有 2 个版本

[PDF] arxiv.org

iRAG: An Incremental Retrieval Augmented Generation System for Videos

MA Arefeen, B Debnath, MYS Uddin… - arXiv preprint arXiv …, 2024 - arxiv.org

Retrieval augmented generation (RAG) systems combine the strengths of language
generation and information retrieval to power many real-world applications like chatbots …

被引用次数：4 相关文章

Multimodal Isotropic Neural Architecture with Patch Embedding

H Truchan, E Naumov, R Abedin, G Palmer… - … Conference on Neural …, 2023 - Springer

Patch embedding has been a significant advancement in Transformer-based models,
particularly the Vision Transformer (ViT), as it enables handling larger image sizes and …

被引用次数：2 相关文章所有 2 个版本

[PDF] arxiv.org

Introducing SSBD+ Dataset with a Convolutional Pipeline for detecting Self-Stimulatory Behaviours in Children using raw videos

V Lokegaonkar, V Jaisankar, P Deepika… - … Conference on E …, 2023 - ieeexplore.ieee.org

Conventionally, evaluation for the diagnosis of Autism spectrum disorder is done by a
trained specialist through questionnaire-based formal assessments and by observation of …

被引用次数：1 相关文章所有 3 个版本

被引用次数：1 相关文章所有 5 个版本

高级搜索

QQ 群