Equi-vocal: Synthesizing queries for compositional video events from limited user interactions

E Zhang, M Daum, D He, B Haynes, R Krishna… - Proceedings of the …, 2023 - dl.acm.org
We introduce EQUI-VOCAL: a new system that automatically synthesizes queries over
videos from limited user interactions. The user only provides a handful of positive and …

Task Me Anything

J Zhang, W Huang, Z Ma, O Michel, D He… - arXiv preprint arXiv …, 2024 - arxiv.org
Benchmarks for large multimodal language models (MLMs) now serve to simultaneously
assess the general capabilities of models instead of evaluating for a specific capability. As a …

EQUI-VOCAL: Synthesizing Queries for Compositional Video Events from Limited User Interactions [Technical Report]

E Zhang, M Daum, D He, B Haynes, R Krishna… - arXiv preprint arXiv …, 2023 - arxiv.org
We introduce EQUI-VOCAL: a new system that automatically synthesizes queries over
videos from limited user interactions. The user only provides a handful of positive and …