Zero-shot urban function inference with street view images through prompting a pretrained vision-language model

W Huang, J Wang, G Cong - International Journal of Geographical …, 2024 - Taylor & Francis
Inferring urban functions using street view images (SVIs) has gained tremendous
momentum. The recent prosperity of large-scale vision-language pretrained models sheds …

DDR: Exploiting Deep Degradation Response as Flexible Image Descriptor

J Wu, Z Ni, H Wang, W Yang, Y Zhou… - arXiv preprint arXiv …, 2024 - arxiv.org
Image deep features extracted by pre-trained networks are known to contain rich and
informative representations. In this paper, we present Deep Degradation Response (DDR) …

I Bet You Did Not Mean That: Testing Semantic Importance via Betting

J Teneggi, J Sulam - arXiv preprint arXiv:2405.19146, 2024 - arxiv.org
Recent works have extended notions of feature importance to\emph {semantic concepts}
that are inherently interpretable to the users interacting with a black-box predictive model …

Gentle-CLIP: Exploring Aligned Semantic In Low-Quality Multimodal Data With Soft Alignment

Z Song, Z Zang, Y Wang, G Yang, J Zheng… - arXiv preprint arXiv …, 2024 - arxiv.org
Multimodal fusion breaks through the barriers between diverse modalities and has already
yielded numerous impressive performances. However, in various specialized fields, it is …