An Examination of the Compositionality of Large Generative Vision-Language Models

文章

学术资源搜索

获得 2 条结果（用时0.01秒）

我的图书馆

An Examination of the Compositionality of Large Generative Vision-Language Models

在引用文章中搜索

[PDF] arxiv.org

Actionhub: a large-scale action video description dataset for zero-shot action recognition

J Zhou, J Liang, KY Lin, J Yang, WS Zheng - arXiv preprint arXiv …, 2024 - arxiv.org

Zero-shot action recognition (ZSAR) aims to learn an alignment model between videos and
class descriptions of seen actions that is transferable to unseen actions. The text queries …

被引用次数：4 相关文章所有 2 个版本

[PDF] arxiv.org

Contrastive Imitation Learning for Language-guided Multi-Task Robotic Manipulation

T Ma, J Zhou, Z Wang, R Qiu, J Liang - arXiv preprint arXiv:2406.09738, 2024 - arxiv.org

Developing robots capable of executing various manipulation tasks, guided by natural
language instructions and visual observations of intricate real-world environments, remains …

被引用次数：1 相关文章

高级搜索

QQ 群

An Examination of the Compositionality of Large Generative Vision-Language Models

Actionhub: a large-scale action video description dataset for zero-shot action recognition

Contrastive Imitation Learning for Language-guided Multi-Task Robotic Manipulation

引用