Large language models (LLMs) have made tremendous progress in natural language understanding and they have also been successfully adopted in other domains such as …
Tasks at the intersection of vision and language have had a profound impact in advancing the capabilities of vision-language models such as dialog-based assistants. However …
Y Wang, S Haruta, D Zeng, J Vizcarra… - arXiv preprint arXiv …, 2024 - arxiv.org
Capturing complex hierarchical human activities, from atomic actions (eg, picking up one present, moving to the sofa, unwrapping the present) to contextual events (eg, celebrating …