The milestone improvements brought about by deep representation learning and pre- training techniques have led to large performance gains across downstream NLP, IR and …
Abstract AutoML (Automated Machine Learning) is an emerging field that aims to automate the process of building machine learning models. AutoML emerged to increase productivity …
The tremendous success of CLIP (Radford et al., 2021) has promoted the research and application of contrastive learning for vision-language pretraining. In this work, we construct …
X Wang, J Wu, J Chen, L Li… - Proceedings of the …, 2019 - openaccess.thecvf.com
We present a new large-scale multilingual video description dataset, VATEX, which contains over 41,250 videos and 825,000 captions in both English and Chinese. Among the captions …
X Chen, Z Wang, Q Hua, WL Shang… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
Automated container terminal (ACT) is considered as port industry development direction, and accurate kinematic data (speed, volume, etc.) is essential for enhancing ACT operation …
The design of widespread vision-and-language datasets and pre-trained encoders directly adopts, or draws inspiration from, the concepts and images of ImageNet. While one can …
In this work, we present a conceptually simple and effective method to train a strong bilingual/multilingual multimodal representation model. Starting from the pre-trained …
Y Zeng, X Zhang, H Li, J Wang… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Vision language pre-training aims to learn alignments between vision and language from a large amount of data. Most existing methods only learn image-text alignments. Some others …
Reliable evaluation benchmarks designed for replicability and comprehensiveness have driven progress in machine learning. Due to the lack of a multilingual benchmark, however …