Videoprompter: an ensemble of foundational models for zero-shot video understanding

文章

学术资源搜索

获得 4 条结果（用时0.03秒）

我的图书馆

Videoprompter: an ensemble of foundational models for zero-shot video understanding

在引用文章中搜索

[PDF] arxiv.org

Multimodal llm enhanced cross-lingual cross-modal retrieval

Y Wang, L Wang, Q Zhou, Z Wang, H Li, G Hua… - Proceedings of the …, 2024 - dl.acm.org

Cross-lingual cross-modal retrieval (CCR) aims to retrieve visually relevant content based
on non-English queries, without relying on human-labeled cross-modal data pairs during …

被引用次数：4 相关文章所有 5 个版本

[PDF] infocomm-journal.com

生成式AI 的大模型提示工程: 方法, 现状与展望

黄峻，林飞，杨静，王兴霞，倪清桦… - 智能科学与技术 …, 2024 - infocomm-journal.com

大语言模型和视觉语言模型在各领域的应用中展示出巨大潜力, 成为研究热点. 然而, 幻觉,
知识迁移, 与人类意图对齐等问题仍然影响着大模型的性能. 首先, 探讨了提示工程与对齐技术 …

Foundation Models for Video Understanding: A Survey

N Madan, A Møgelmose, R Modi, YS Rawat… - arXiv preprint arXiv …, 2024 - arxiv.org

Video Foundation Models (ViFMs) aim to learn a general-purpose representation for various
video understanding tasks. Leveraging large-scale datasets and powerful models, ViFMs …

被引用次数：16 相关文章所有 5 个版本

[PDF] arxiv.org

How Vision-Language Tasks Benefit from Large Pre-trained Models: A Survey

Y Qi, H Li, Y Song, X Wu, J Luo - arXiv preprint arXiv:2412.08158, 2024 - arxiv.org

The exploration of various vision-language tasks, such as visual captioning, visual question
answering, and visual commonsense reasoning, is an important area in artificial intelligence …

高级搜索

QQ 群

Videoprompter: an ensemble of foundational models for zero-shot video understanding

Multimodal llm enhanced cross-lingual cross-modal retrieval

生成式AI 的大模型提示工程: 方法, 现状与展望

Foundation Models for Video Understanding: A Survey

How Vision-Language Tasks Benefit from Large Pre-trained Models: A Survey

引用