X Li, C Wen, Y Hu, Z Yuan… - IEEE Geoscience and …, 2024 - ieeexplore.ieee.org
The remarkable achievements of ChatGPT and Generative Pre-trained Transformer 4 (GPT- 4) have sparked a wave of interest and research in the field of large language models …
M Xu, Z Zhang, F Wei, H Hu… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
This paper presents a new framework for open-vocabulary semantic segmentation with the pre-trained vision-language model, named SAN. Our approach models the semantic …
Most visual recognition studies rely heavily on crowd-labelled data in deep neural networks (DNNs) training, and they usually train a DNN for each single visual recognition task …
Z Geng, B Yang, T Hang, C Li, S Gu… - Proceedings of the …, 2024 - openaccess.thecvf.com
We present InstructDiffusion a unified and generic framework for aligning computer vision tasks with human instructions. Unlike existing approaches that integrate prior knowledge …
H Luo, J Bao, Y Wu, X He, T Li - International Conference on …, 2023 - proceedings.mlr.press
Recently, the contrastive language-image pre-training, eg, CLIP, has demonstrated promising results on various downstream tasks. The pre-trained model can capture enriched …
Pre-training across 3D vision and language remains under development because of limited training data. Recent works attempt to transfer vision-language (VL) pre-training methods to …
Visual anomaly classification and segmentation are vital for automating industrial quality inspection. The focus of prior research in the field has been on training custom models for …
Visual segmentation seeks to partition images, video frames, or point clouds into multiple segments or groups. This technique has numerous real-world applications, such as …
In the field of visual scene understanding, deep neural networks have made impressive advancements in various core tasks like segmentation, tracking, and detection. However …