Contrastive language-image pre-training with knowledge graphs

Z Chen, Y Zhang, Y Fang, Y Geng, L Guo… - arXiv preprint arXiv …, 2024 - arxiv.org

Knowledge Graphs (KGs) play a pivotal role in advancing various AI applications, with the
semantic web community's exploration into multi-modal dimensions unlocking new avenues …

被引用次数：39 相关文章所有 2 个版本

[PDF] thecvf.com

Flatten transformer: Vision transformer using focused linear attention

D Han, X Pan, Y Han, S Song… - Proceedings of the …, 2023 - openaccess.thecvf.com

The quadratic computation complexity of self-attention has been a persistent challenge
when applying Transformer models to vision tasks. Linear attention, on the other hand, offers …

被引用次数：159 相关文章所有 5 个版本

[PDF] sciencedirect.com

Artificial general intelligence for radiation oncology

C Liu, Z Liu, J Holmes, L Zhang, L Zhang, Y Ding… - Meta-radiology, 2023 - Elsevier

The emergence of artificial general intelligence (AGI) is transforming radiation oncology. As
prominent vanguards of AGI, large language models (LLMs) such as GPT-4 and PaLM 2 can …

被引用次数：24 相关文章所有 8 个版本

[PDF] arxiv.org

Agent attention: On the integration of softmax and linear attention

D Han, T Ye, Y Han, Z Xia, S Pan, P Wan… - … on Computer Vision, 2025 - Springer

The attention module is the key component in Transformers. While the global attention
mechanism offers high expressiveness, its excessive computational cost restricts its …

被引用次数：63 相关文章所有 2 个版本

[PDF] arxiv.org

A survey of knowledge graph reasoning on graph types: Static, dynamic, and multi-modal

K Liang, L Meng, M Liu, Y Liu, W Tu… - … on Pattern Analysis …, 2024 - ieeexplore.ieee.org

Knowledge graph reasoning (KGR), aiming to deduce new facts from existing facts based on
mined logic rules underlying knowledge graphs (KGs), has become a fast-growing research …

被引用次数：93 相关文章所有 3 个版本

[PDF] thecvf.com

Slide-transformer: Hierarchical vision transformer with local self-attention

X Pan, T Ye, Z Xia, S Song… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

Self-attention mechanism has been a key factor in the recent progress of Vision Transformer
(ViT), which enables adaptive feature extraction from global contexts. However, existing self …

被引用次数：63 相关文章所有 6 个版本

[PDF] arxiv.org

Grounding language models for visual entity recognition

Z Xiao, M Gong, P Cascante-Bonilla, X Zhang… - … on Computer Vision, 2025 - Springer

Abstract We introduce AutoVER, an Autoregressive model for Visual Entity Recognition. Our
model extends an autoregressive Multimodal Large Language Model by employing retrieval …

被引用次数：8 相关文章所有 2 个版本

[PDF] acm.org

Heterogeneous contrastive learning for foundation models and beyond

L Zheng, B Jing, Z Li, H Tong, J He - Proceedings of the 30th ACM …, 2024 - dl.acm.org

In the era of big data and Artificial Intelligence, an emerging paradigm is to utilize contrastive
self-supervised learning to model large-scale heterogeneous data. Many existing foundation …

被引用次数：11 相关文章所有 2 个版本

[PDF] wiley.com

Transformer technology in molecular science

J Jiang, L Ke, L Chen, B Dou, Y Zhu… - Wiley …, 2024 - Wiley Online Library

A transformer is the foundational architecture behind large language models designed to
handle sequential data by using mechanisms of self‐attention to weigh the importance of …

被引用次数：5 相关文章所有 2 个版本

[PDF] arxiv.org

Efficient token-guided image-text retrieval with consistent multimodal contrastive training

C Liu, Y Zhang, H Wang, W Chen… - … on Image Processing, 2023 - ieeexplore.ieee.org

Image-text retrieval is a central problem for understanding the semantic relationship
between vision and language, and serves as the basis for various visual and language …

被引用次数：20 相关文章所有 7 个版本

高级搜索

QQ 群