Valor: Vision-audio-language omni-perception pretraining model and dataset S Chen, X He, L Guo, X Zhu, W Wang, J Tang, J Liu arXiv preprint arXiv:2304.08345, 2023 | 74 | 2023 |
Non-autoregressive image captioning with counterfactuals-critical multi-agent learning L Guo, J Liu, X Zhu, X He, J Jiang, H Lu arXiv preprint arXiv:2005.04690, 2020 | 52 | 2020 |
Global-local propagation network for RGB-D semantic segmentation S Chen, X Zhu, W Liu, X He, J Liu arXiv preprint arXiv:2101.10801, 2021 | 21 | 2021 |
Vlab: Enhancing video language pre-training by feature adapting and blending X He, S Chen, F Ma, Z Huang, X Jin, Z Liu, D Fu, Y Yang, J Liu, J Feng arXiv preprint arXiv:2305.13167, 2023 | 20 | 2023 |
An efficient sampling-based attention network for semantic segmentation X He, J Liu, W Wang, H Lu IEEE Transactions on Image Processing 31, 2850-2863, 2022 | 15 | 2022 |
Dynamic warping network for semantic video segmentation J Li, Y Zhao, X He, X Zhu, J Liu Complexity 2021 (1), 6680509, 2021 | 7 | 2021 |
Mamo: Fine-grained vision-language representations learning with masked multimodal modeling Z Zhao, L Guo, X He, S Shao, Z Yuan, J Liu Proceedings of the 46th International ACM SIGIR Conference on Research and …, 2023 | 5 | 2023 |
Cosa: Concatenated sample pretrained vision-language foundation model S Chen, X He, H Li, X Jin, J Feng, J Liu arXiv preprint arXiv:2306.09085, 2023 | 5 | 2023 |
Mamo: masked multimodal modeling for fine-grained vision-language representation learning Z Zhao, L Guo, X He, S Shao, Z Yuan, J Liu arXiv preprint arXiv:2210.04183, 2022 | 4 | 2022 |
CM-MaskSD: Cross-Modality Masked Self-Distillation for Referring Image Segmentation W Wang, X He, Y Zhang, L Guo, J Shen, J Li, J Liu IEEE Transactions on Multimedia, 2024 | 3 | 2024 |
Consistent-separable feature representation for semantic segmentation X He, J Liu, J Fu, X Zhu, J Wang, H Lu Proceedings of the AAAI Conference on Artificial Intelligence 35 (2), 1531-1539, 2021 | 3 | 2021 |
Mmnet: Multi-mask network for referring image segmentation Y Yan, X He, W Wan, J Liu arXiv preprint arXiv:2305.14969, 2023 | 2 | 2023 |
Beyond Literal Descriptions: Understanding and Locating Open-World Objects Aligned with Human Intentions W Wang, Y Zhang, X He, Y Yan, Z Zhao, X Wang, J Liu arXiv preprint arXiv:2402.11265, 2024 | 1 | 2024 |
SC-Tune: Unleashing Self-Consistent Referential Comprehension in Large Vision Language Models T Yue, J Cheng, L Guo, X Dai, Z Zhao, X He, G Xiong, Y Lv, J Liu Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2024 | 1 | 2024 |
Unveiling Parts Beyond Objects: Towards Finer-Granularity Referring Expression Segmentation W Wang, T Yue, Y Zhang, L Guo, X He, X Wang, J Liu Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2024 | 1 | 2024 |
Enhancing Vision-Language Pre-Training with Jointly Learned Questioner and Dense Captioner Z Liu, S Chen, L Guo, H Li, X He, J Liu Proceedings of the 31st ACM International Conference on Multimedia, 5120-5131, 2023 | 1 | 2023 |
WL-MSR: Watch and Listen for Multimodal Subtitle Recognition J Liu, H Wang, W Wang, X He, J Liu ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and …, 2023 | 1 | 2023 |
Exploiting Spatial-Temporal Semantic Consistency for Video Scene Parsing X He, W Wang, Z Xu, H Wang, J Jiang, J Liu arXiv preprint arXiv:2109.02281, 2021 | 1 | 2021 |
PVUW 2024 Challenge on Complex Video Understanding: Methods and Results H Ding, C Liu, Y Wei, N Ravi, S He, S Bai, P Torr, D Miao, X Li, Z He, ... arXiv preprint arXiv:2406.17005, 2024 | | 2024 |
CLIP-driven hierarchical fusion for referring image segmentation Y Yan, X He, J Liu International Conference on Image, Signal Processing, and Pattern …, 2024 | | 2024 |