Lavender: Unifying video-language understanding as masked language modeling

L Li, Z Gan, K Lin, CC Lin, Z Liu… - Proceedings of the …, 2023 - openaccess.thecvf.com
Unified vision-language frameworks have greatly advanced in recent years, most of which
adopt an encoder-decoder architecture to unify image-text tasks as sequence-to-sequence …

LAVENDER: Unifying Video-Language Understanding as Masked Language Modeling

L Li, Z Gan, K Lin, CC Lin, Z Liu, C Liu… - arXiv preprint arXiv …, 2022 - arxiv.org
Unified vision-language frameworks have greatly advanced in recent years, most of which
adopt an encoder-decoder architecture to unify image-text tasks as sequence-to-sequence …

LAVENDER: Unifying Video-Language Understanding as Masked Language Modeling

L Li, Z Gan, K Lin, CC Lin, Z Liu, C Liu… - arXiv e-prints, 2022 - ui.adsabs.harvard.edu
Unified vision-language frameworks have greatly advanced in recent years, most of which
adopt an encoder-decoder architecture to unify image-text tasks as sequence-to-sequence …

LAVENDER: Unifying Video-Language Understanding as Masked Language Modeling

L Li, Z Gan, K Lin, CC Lin, Z Liu, C Liu… - 2023 IEEE/CVF …, 2023 - ieeexplore.ieee.org
Unified vision-language frameworks have greatly advanced in recent years, most of which
adopt an encoder-decoder architecture to unify image-text tasks as sequence-to-sequence …

LAVENDER: Unifying Video-Language Understanding as Masked Language Modeling

L Li, Z Gan, K Lin, CC Lin, Z Liu, C Liu… - 2023 IEEE/CVF …, 2023 - computer.org
Unified vision-language frameworks have greatly advanced in recent years, most of which
adopt an encoder-decoder architecture to unify image-text tasks as sequence-to-sequence …

LAVENDER: Unifying Video-Language Understanding as Masked Language Modeling

L Li, Z Gan, K Lin, CC Lin, Z Liu, C Liu, L Wang - cvpr.thecvf.com
LAVENDER: Unifying Video-Language Understanding as Masked Language Modeling Page
1 LAVENDER: Unifying Video-Language Understanding as Masked Language Modeling …