Z Li, Z Rao, L Pan, P Wang, Z Xu - arXiv preprint arXiv:2301.08871, 2023 - arxiv.org
Multivariate Time Series forecasting has been an increasingly popular topic in various applications and scenarios. Recently, contrastive learning and Transformer-based models …
S Sheybani, H Hansaria, J Wood… - Advances in Neural …, 2024 - proceedings.neurips.cc
Infants possess a remarkable ability to rapidly learn and process visual inputs. As an infant's mobility increases, so does the variety and dynamics of their visual inputs. Is this change in …
X Tang, Y Wang, J Ma, X Zhang, F Liu… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Cross-modal remote-sensing image–text retrieval (CMRSITR) is a challenging topic in the remote-sensing (RS) community. It has gained growing attention because it can be flexibly …
D Yin, Y Yang, Z Wang, H Yu… - Proceedings of the …, 2023 - openaccess.thecvf.com
Fine-tuning large-scale pre-trained vision models to downstream tasks is a standard technique for achieving state-of-the-art performance on computer vision benchmarks …
IR Dave, MN Rizve, C Chen… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Abstract Semi-Supervised Learning can be more beneficial for the video domain compared to images because of its higher annotation cost and dimensionality. Besides, any video …
X Zhou, A Arnab, C Sun… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Current state-of-the-art video models process a video clip as a long sequence of spatio- temporal tokens. However, they do not explicitly model objects, their interactions across the …
Y Liu, L Kong, X Wu, R Chen, X Li… - Proceedings of the …, 2024 - openaccess.thecvf.com
A unified and versatile LiDAR segmentation model with strong robustness and generalizability is desirable for safe autonomous driving perception. This work presents …
We aim to investigate whether end-to-end learning of visual reasoning can be achieved with general-purpose neural networks, with the help of visual pretraining. A positive result would …
Z Zhao, B Huang, S Xing, G Wu… - Proceedings of the …, 2024 - openaccess.thecvf.com
Self-supervised foundation models have shown great potential in computer vision thanks to the pre-training paradigm of masked autoencoding. Scale is a primary factor influencing the …