IISAN: Efficiently adapting multimodal representation for sequential recommendation with decoupled PEFT

J Fu, X Ge, X Xin, A Karatzoglou, I Arapakis… - Proceedings of the 47th …, 2024 - dl.acm.org
Multimodal foundation models are transformative in sequential recommender systems,
leveraging powerful representation learning capabilities. While Parameter-efficient Fine …

Open Panoramic Segmentation

J Zheng, R Liu, Y Chen, K Peng, C Wu, K Yang… - … on Computer Vision, 2025 - Springer
Abstract Panoramic images, capturing a 360\(^\circ\) field of view (FoV), encompass
omnidirectional spatial information crucial for scene understanding. However, it is not only …

Exploring Regional Clues in CLIP for Zero-Shot Semantic Segmentation

Y Zhang, MH Guo, M Wang… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
CLIP has demonstrated marked progress in visual recognition due to its powerful pre-
training on large-scale image-text pairs. However it still remains a critical challenge: how to …

BEVCar: Camera-Radar Fusion for BEV Map and Object Segmentation

J Schramm, N Vödisch, K Petek, BR Kiran… - arXiv preprint arXiv …, 2024 - arxiv.org
Semantic scene segmentation from a bird's-eye-view (BEV) perspective plays a crucial role
in facilitating planning and decision-making for mobile robots. Although recent vision-only …

LLMFormer: Large Language Model for Open-Vocabulary Semantic Segmentation

H Shi, SD Dao, J Cai - International Journal of Computer Vision, 2024 - Springer
Open-vocabulary (OV) semantic segmentation has attracted increasing attention in recent
years, which aims to recognize objects in an open class set for real-world applications …

Laddering vision foundation model for remote sensing image change detection

Y Liu, G Zhou - Journal of Applied Remote Sensing, 2024 - spiedigitallibrary.org
This paper proposes a novel laddering vision foundation model for change detection (CD) of
remote sensing images. Current approaches have limitations in simultaneously extracting …

As Firm As Their Foundations: Can open-sourced foundation models be used to create adversarial examples for downstream tasks?

A Hu, J Gu, F Pinto, K Kamnitsas, P Torr - arXiv preprint arXiv:2403.12693, 2024 - arxiv.org
Foundation models pre-trained on web-scale vision-language data, such as CLIP, are
widely used as cornerstones of powerful machine learning systems. While pre-training offers …

Automatic segmentation of 15 critical anatomical labels and measurements of cardiac axis and cardiothoracic ratio in fetal four chambers using nnU-NetV2

B Liang, F Peng, D Luo, Q Zeng, H Wen… - BMC Medical Informatics …, 2024 - Springer
Background Accurate segmentation of critical anatomical structures in fetal four-chamber
view images is essential for the early detection of congenital heart defects. Current prenatal …

SegEarth-OV: Towards Traning-Free Open-Vocabulary Segmentation for Remote Sensing Images

K Li, R Liu, X Cao, D Meng, Z Wang - arXiv preprint arXiv:2410.01768, 2024 - arxiv.org
Remote sensing image plays an irreplaceable role in fields such as agriculture, water
resources, military, and disaster relief. Pixel-level interpretation is a critical aspect of remote …

Efficient and Effective Adaptation of Multimodal Foundation Models in Sequential Recommendation

J Fu, X Ge, X Xin, A Karatzoglou, I Arapakis… - arXiv preprint arXiv …, 2024 - arxiv.org
Multimodal foundation models (MFMs) have revolutionized sequential recommender
systems through advanced representation learning. While Parameter-efficient Fine-tuning …