Gtp-4o: Modality-prompted heterogeneous graph learning for omni-modal biomedical representation

C Li, X Liu, C Wang, Y Liu, W Yu, J Shao… - European Conference on …, 2025 - Springer
Recent advances in learning multi-modal representation have witnessed the success in
biomedical domains. While established techniques enable handling multi-modal …

Imitate: Clinical prior guided hierarchical vision-language pre-training

C Liu, S Cheng, M Shi, A Shah, W Bai… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
In the field of medical Vision-Language Pretraining (VLP), significant efforts have been
devoted to deriving text and image features from both clinical reports and associated …

Research on image recognition technology based on multimodal deep learning

J Wang, X Li, Y Jin, Y Zhong, K Zhang… - arXiv preprint arXiv …, 2024 - arxiv.org
This project investigates the human multi-modal behavior identification algorithm utilizing
deep neural networks. According to the characteristics of different modal information …

TokenUnify: Scalable Autoregressive Visual Pre-training with Mixture Token Prediction

Y Chen, H Shi, X Liu, T Shi, R Zhang, D Liu… - arXiv preprint arXiv …, 2024 - arxiv.org
Autoregressive next-token prediction is a standard pretraining method for large-scale
language models, but its application to vision tasks is hindered by the non-sequential nature …

Mask Factory: Towards High-quality Synthetic Data Generation for Dichotomous Image Segmentation

H Qian, YD Chen, S Lou, FS Khan, X Jin… - arXiv preprint arXiv …, 2024 - arxiv.org
Dichotomous Image Segmentation (DIS) tasks require highly precise annotations, and
traditional dataset creation methods are labor intensive, costly, and require extensive …

UniCompress: Enhancing Multi-Data Medical Image Compression with Knowledge Distillation

R Yang, Y Chen, Z Zhang, X Liu, Z Li, K He… - arXiv preprint arXiv …, 2024 - arxiv.org
In the field of medical image compression, Implicit Neural Representation (INR) networks
have shown remarkable versatility due to their flexible compression ratios, yet they are …

A survey of medical vision-and-language applications and their techniques

Q Chen, R Zhao, S Wang, VMH Phan, A Hengel… - arXiv preprint arXiv …, 2024 - arxiv.org
Medical vision-and-language models (MVLMs) have attracted substantial interest due to
their capability to offer a natural language interface for interpreting complex medical data …

STeInFormer: Spatial-Temporal Interaction Transformer Architecture for Remote Sensing Change Detection

X Ma, Z Wu, M Ma, M Zhao, F Yang… - IEEE Journal of …, 2024 - ieeexplore.ieee.org
Convolutional neural networks and attention mechanisms have greatly benefited remote
sensing change detection (RSCD) because of their outstanding discriminative ability …

Advancing precise diagnosis of nasopharyngeal carcinoma through endoscopy-based radiomics analysis

Y Xu, J Wang, C Li, Y Su, H Peng, L Guo, S Lin, J Li… - Iscience, 2024 - cell.com
Nasopharyngeal carcinoma (NPC) has high metastatic potential and is hard to detect early.
This study aims to develop a deep learning model for NPC diagnosis using optical imagery …

Benchmarking and Boosting Radiology Report Generation for 3D High-Resolution Medical Images

C Liu, Z Wan, Y Wang, H Shen, H Wang… - arXiv preprint arXiv …, 2024 - arxiv.org
Automatic radiology report generation can significantly benefit the labor-intensive process of
report writing by radiologists, especially for 3D radiographs like CT scans, which are crucial …