Fewer Tokens and Fewer Videos: Extending Video Understanding Abilities in Large Vision-Language...

文章

学术资源搜索

获得 2 条结果（用时0.02秒）

我的图书馆

Fewer Tokens and Fewer Videos: Extending Video Understanding Abilities in Large Vision-Language...

在引用文章中搜索

[PDF] arxiv.org

Video understanding with large language models: A survey

Y Tang, J Bi, S Xu, L Song, S Liang, T Wang… - arXiv preprint arXiv …, 2023 - arxiv.org

With the burgeoning growth of online video platforms and the escalating volume of video
content, the demand for proficient video understanding tools has intensified markedly. Given …

被引用次数：55 相关文章所有 2 个版本

[PDF] arxiv.org

Beyond Training: Dynamic Token Merging for Zero-Shot Video Understanding

Y Zhang, Z Zhao, Z Chen, Z Ding, X Yang… - arXiv preprint arXiv …, 2024 - arxiv.org

Recent advancements in multimodal large language models (MLLMs) have opened new
avenues for video understanding. However, achieving high fidelity in zero-shot video tasks …

被引用次数：1 相关文章所有 2 个版本

高级搜索

QQ 群

Fewer Tokens and Fewer Videos: Extending Video Understanding Abilities in Large Vision-Language...

Video understanding with large language models: A survey

Beyond Training: Dynamic Token Merging for Zero-Shot Video Understanding

引用