Prestu: Pre-training for scene-text understanding

J Kil, S Changpinyo, X Chen, H Hu… - Proceedings of the …, 2023 - openaccess.thecvf.com
The ability to recognize and reason about text embedded in visual inputs is often lacking in
vision-and-language (V&L) models, perhaps because V&L pre-training methods have often …

PreSTU: Pre-Training for Scene-Text Understanding

J Kil, B Changpinyo, HF Hu, S Goodman, WL Chao… - research.google
The ability to recognize and reason about text embedded in visual inputs is often lacking in
vision-and-language (V&L) models, perhaps because V&L pre-training methods have often …

[PDF][PDF] PreSTU: Pre-Training for Scene-Text Understanding

J Kil, S Changpinyo, X Chen, H Hu, S Goodman… - researchgate.net
The ability to read and reason about texts in an image is often lacking in vision-andlanguage
(V&L) models. How can we learn V&L models that exhibit strong scene-text understanding …

PreSTU: Pre-Training for Scene-Text Understanding

J Kil, S Changpinyo, X Chen, H Hu… - 2023 IEEE/CVF …, 2023 - ieeexplore.ieee.org
The ability to recognize and reason about text embedded in visual inputs is often lacking in
vision-and-language (V&L) models, perhaps because V&L pre-training methods have often …

[引用][C] PreSTU: Pre-Training for Scene-Text Understanding

J Kil - heendung.github.io
Jihyung Kil | Publications Jihyung Kil Toggle navigation Home Publications (current)
Teaching Publications 2023 1.ICCV PreSTU: Pre-Training for Scene-Text Understanding …

PreSTU: Pre-Training for Scene-Text Understanding

J Kil, S Changpinyo, X Chen, H Hu… - 2023 IEEE/CVF …, 2023 - computer.org
The ability to recognize and reason about text embedded in visual inputs is often lacking in
vision-and-language (V&L) models, perhaps because V&L pre-training methods have often …

PreSTU: Pre-Training for Scene-Text Understanding

J Kil, S Changpinyo, X Chen, H Hu, S Goodman… - arXiv preprint arXiv …, 2022 - arxiv.org
The ability to recognize and reason about text embedded in visual inputs is often lacking in
vision-and-language (V&L) models, perhaps because V&L pre-training methods have often …

PreSTU: Pre-Training for Scene-Text Understanding

J Kil, S Changpinyo, X Chen, H Hu… - arXiv e …, 2022 - ui.adsabs.harvard.edu
The ability to recognize and reason about text embedded in visual inputs is often lacking in
vision-and-language (V&L) models, perhaps because V&L pre-training methods have often …

PreSTU: Pre-Training for Scene-Text Understanding

J Kil, SB Changpinyo, X Chen, HF Hu, S Goodman… - research.google
The ability to recognize and reason about text embedded in visual inputs is often lacking in
vision-and-language (V&L) models, perhaps because V&L pre-training methods have often …