Normalized and geometry-aware self-attention network for image captioning L Guo, J Liu, X Zhu, P Yao, S Lu, H Lu Proceedings of the IEEE/CVF conference on computer vision and pattern …, 2020 | 227 | 2020 |
Cptr: Full transformer network for image captioning W Liu, S Chen, L Guo, X Zhu, J Liu arXiv preprint arXiv:2101.10804, 2021 | 178 | 2021 |
Mscap: Multi-style image captioning with unpaired stylized text L Guo, J Liu, P Yao, J Li, H Lu Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2019 | 115 | 2019 |
Aligning linguistic words and visual semantic units for image captioning L Guo, J Liu, J Tang, J Li, W Luo, H Lu Proceedings of the 27th ACM international conference on multimedia, 765-773, 2019 | 114 | 2019 |
Valor: Vision-audio-language omni-perception pretraining model and dataset S Chen, X He, L Guo, X Zhu, W Wang, J Tang, J Liu arXiv preprint arXiv:2304.08345, 2023 | 65 | 2023 |
Non-autoregressive image captioning with counterfactuals-critical multi-agent learning L Guo, J Liu, X Zhu, X He, J Jiang, H Lu arXiv preprint arXiv:2005.04690, 2020 | 51 | 2020 |
Show, tell, and polish: Ruminant decoding for image captioning L Guo, J Liu, S Lu, H Lu IEEE Transactions on Multimedia 22 (8), 2149-2162, 2019 | 46 | 2019 |
OPT: Omni-perception pre-trainer for cross-modal understanding and generation J Liu, X Zhu, F Liu, L Guo, Z Zhao, M Sun, W Wang, H Lu, S Zhou, J Zhang, ... arXiv preprint arXiv:2107.00249, 2021 | 37 | 2021 |
Sketch-based image retrieval using generative adversarial networks L Guo, J Liu, Y Wang, Z Luo, W Wen, H Lu Proceedings of the 25th ACM international conference on Multimedia, 1267-1268, 2017 | 35 | 2017 |
Boosted transformer for image captioning J Li, P Yao, L Guo, W Zhang Applied Sciences 9 (16), 3260, 2019 | 33 | 2019 |
Chatbridge: Bridging modalities with large language model as a language catalyst Z Zhao, L Guo, T Yue, S Chen, S Shao, X Zhu, Z Yuan, J Liu arXiv preprint arXiv:2305.16103, 2023 | 30 | 2023 |
AutoCaption: Image captioning with neural architecture search X Zhu, W Wang, L Guo, J Liu arXiv preprint arXiv:2012.09742, 2020 | 16 | 2020 |
Vl-mamba: Exploring state space models for multimodal learning Y Qiao, Z Yu, L Guo, S Chen, Z Zhao, M Sun, Q Wu, J Liu arXiv preprint arXiv:2403.13600, 2024 | 15 | 2024 |
Mscap: Multi-style image captioning with unpaired stylized text. 2019 IEEE L Guo, J Liu, P Yao, J Li, H Lu CVF Conference on Computer Vision and Pattern Recognition (CVPR), 4199-4208, 2019 | 11 | 2019 |
Fast sequence generation with multi-agent reinforcement learning L Guo, J Liu, X Zhu, H Lu arXiv preprint arXiv:2101.09698, 2021 | 10 | 2021 |
Image captioning with word gate and adaptive self-critical learning X Zhu, L Li, J Liu, L Guo, Z Fang, H Peng, X Niu Applied Sciences 8 (6), 909, 2018 | 6 | 2018 |
Mamo: masked multimodal modeling for fine-grained vision-language representation learning Z Zhao, L Guo, X He, S Shao, Z Yuan, J Liu arXiv preprint arXiv:2210.04183, 2022 | 5 | 2022 |
Mm21 pre-training for video understanding challenge: Video captioning with pretraining techniques S Chen, X Zhu, D Hao, W Liu, J Liu, Z Zhao, L Guo, J Liu Proceedings of the 29th ACM International Conference on Multimedia, 4853-4857, 2021 | 5 | 2021 |
Multi-view features and hybrid reward strategies for vatex video captioning challenge 2019 X Zhu, L Guo, P Yao, J Liu, H Lu, Z Yu, W Liu, H Lu arXiv preprint arXiv:1910.11102, 2019 | 5 | 2019 |
MAMO: Fine-Grained Vision-Language Representations Learning with Masked Multimodal Modeling Z Zhao, L Guo, X He, S Shao, Z Yuan, J Liu Proceedings of the 46th International ACM SIGIR Conference on Research and …, 2023 | 4 | 2023 |