Actionhub: a large-scale action video description dataset for zero-shot action recognition

J Zhou, J Liang, KY Lin, J Yang, WS Zheng - arXiv preprint arXiv …, 2024 - arxiv.org
Zero-shot action recognition (ZSAR) aims to learn an alignment model between videos and
class descriptions of seen actions that is transferable to unseen actions. The text queries …

Contrastive Imitation Learning for Language-guided Multi-Task Robotic Manipulation

T Ma, J Zhou, Z Wang, R Qiu, J Liang - arXiv preprint arXiv:2406.09738, 2024 - arxiv.org
Developing robots capable of executing various manipulation tasks, guided by natural
language instructions and visual observations of intricate real-world environments, remains …