Inter-intra modal representation augmentation with DCT-transformer adversarial network for...

Y Bin, H Li, Y Xu, X Xu, Y Yang, HT Shen - Proceedings of the 31st ACM …, 2023 - dl.acm.org

Most existing cross-modal retrieval methods employ two-stream encoders with different
architectures for images and texts, eg, CNN for images and RNN/Transformer for texts. Such …

被引用次数：14 相关文章所有 3 个版本

Frequency information disentanglement network for video-based person re-identification

L Liu, X Yang, N Wang, X Gao - IEEE Transactions on Image …, 2023 - ieeexplore.ieee.org

Recently, most video-based person re-identification (Re-ID) methods adopt complex model
or multi-scaled information to explore more discriminative spatio-temporal clues, thus …

被引用次数：4 相关文章所有 5 个版本

Hsmh: A hierarchical sequence multi-hop reasoning model with reinforcement learning

D Wang, B Li, B Song, C Chen… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org

The incompleteness of knowledge graphs (KGs) negatively impacts the performance of KGs
in downstream applications (eg, recommendation systems and information retrieval). This …

被引用次数：3 相关文章所有 4 个版本

被引用次数：1 相关文章

ITContrast: contrastive learning with hard negative synthesis for image-text matching

F Wu, Q Wang, Z Wang, S Yu, Y Li, B Zhang… - The Visual Computer, 2024 - Springer

Image-text matching aims to bridge vision and language so as to match the instance of one
modality with the instance of another modality. Recent years have seen considerable …

被引用次数：1 相关文章所有 6 个版本

高级搜索

QQ 群