P Li, M Zhang, P Lin, J Wan… - Computational …, 2022 - Wiley Online Library
People can accurately describe an image by constantly referring to the visual information
and key text information of the image. Inspired by this idea, we propose the VTR‐PTM …