i-code: An integrative and composable multimodal learning framework

Z Yang, Y Fang, C Zhu, R Pryzant, D Chen… - Proceedings of the …, 2023 - ojs.aaai.org
Human intelligence is multimodal; we integrate visual, linguistic, and acoustic signals to
maintain a holistic worldview. Most current pretraining methods, however, are limited to one …

Video Entailment via Reaching a Structure-Aware Cross-modal Consensus

X Yao, J Gao, M Chen, C Xu - Proceedings of the 31st ACM International …, 2023 - dl.acm.org
This paper targets at the task of video entailment, which aims to achieve a thorough
comprehension and draw inferences on whether a natural language statement entails or …

Tevl: Trilinear encoder for video-language representation learning

X Man, J Shao, F Chen, M Zhang… - ACM Transactions on …, 2023 - dl.acm.org
Pre-training model on large-scale unlabeled web videos followed by task-specific fine-
tuning is a canonical approach to learning video and language representations. However …

Vote'n'Rank: Revision of Benchmarking with Social Choice Theory

M Rofin, V Mikhailov, M Florinskiy… - arXiv preprint arXiv …, 2022 - arxiv.org
The development of state-of-the-art systems in different applied areas of machine learning
(ML) is driven by benchmarks, which have shaped the paradigm of evaluating …

[HTML][HTML] Эталонное тестирование языковых моделей на задачах понимания естественного языка

ВН Михайлов - 2023 - dissercat.com
Natural language processing (NLP) is an interdisciplinary subfield of computational
linguistics, computer science, and artificial intelligence aimed at the development of …