Winning the ICCV'2021 VALUE Challenge: Task-aware Ensemble and Transfer Learning with Visual...

Z Yang, Y Fang, C Zhu, R Pryzant, D Chen… - Proceedings of the …, 2023 - ojs.aaai.org

Human intelligence is multimodal; we integrate visual, linguistic, and acoustic signals to
maintain a holistic worldview. Most current pretraining methods, however, are limited to one …

被引用次数：44 相关文章所有 5 个版本

Video Entailment via Reaching a Structure-Aware Cross-modal Consensus

X Yao, J Gao, M Chen, C Xu - Proceedings of the 31st ACM International …, 2023 - dl.acm.org

This paper targets at the task of video entailment, which aims to achieve a thorough
comprehension and draw inferences on whether a natural language statement entails or …

Tevl: Trilinear encoder for video-language representation learning

X Man, J Shao, F Chen, M Zhang… - ACM Transactions on …, 2023 - dl.acm.org

Pre-training model on large-scale unlabeled web videos followed by task-specific fine-
tuning is a canonical approach to learning video and language representations. However …

被引用次数：7 相关文章

[PDF] arxiv.org

Vote'n'Rank: Revision of Benchmarking with Social Choice Theory

M Rofin, V Mikhailov, M Florinskiy… - arXiv preprint arXiv …, 2022 - arxiv.org

The development of state-of-the-art systems in different applied areas of machine learning
(ML) is driven by benchmarks, which have shaped the paradigm of evaluating …

被引用次数：7 相关文章所有 6 个版本

[HTML] dissercat.com

[HTML][HTML] Эталонное тестирование языковых моделей на задачах понимания естественного языка

ВН Михайлов - 2023 - dissercat.com

Natural language processing (NLP) is an interdisciplinary subfield of computational
linguistics, computer science, and artificial intelligence aimed at the development of …

高级搜索

QQ 群