Multi-modal learning with text merging for textvqa

C Xu, Z Xu, Y He, S Zhou, J Guan - ICASSP 2022-2022 IEEE …, 2022 - ieeexplore.ieee.org
C Xu, Z Xu, Y He, S Zhou, J Guan
ICASSP 2022-2022 IEEE International Conference on Acoustics …, 2022ieeexplore.ieee.org
Text visual question answer (TextVQA) is an important task of visual text understanding,
which requires to understand the text generated by text recognition module and provide
correct answers to specific questions. Recent works of TextVQA have tried to combine text
recognition and multi-modal learning. However, due to the lack of effective preprocessing of
text recognition output, existing approaches suffer from serious contextual information
missing, which leads to unsatisfactory performance. In this work, we propose a Multi-Modal …
Text visual question answer (TextVQA) is an important task of visual text understanding, which requires to understand the text generated by text recognition module and provide correct answers to specific questions. Recent works of TextVQA have tried to combine text recognition and multi-modal learning. However, due to the lack of effective preprocessing of text recognition output, existing approaches suffer from serious contextual information missing, which leads to unsatisfactory performance. In this work, we propose a Multi-Modal Learning framework with Text Merging (MML&TM in short) for TextVQA, where we develop a text merging (TM) algorithm, which can effectively merge the word-level text obtained from the text recognition module to construct line-level and paragraph-level texts for enhancing semantic context, which is crucial to visual text understanding. The TM module can be easily incorporated into the multi-modal learning framework to generate more comprehensive answers for TextVQA. We evaluate our method on a public dataset ST-VQA. Experimental results show that our TM algorithm can obtain complete semantic information, which subsequently helps MML&TM generate better answers for TextVQA.
ieeexplore.ieee.org
以上显示的是最相近的搜索结果。 查看全部搜索结果