Leveraging Unimodal Self-Supervised Learning for Multimodal Audio-Visual Speech Recognition X Pan, P Chen, Y Gong, H Zhou, X Wang, Z Lin ACL 2022 Main Conference 1, 4491--4503, 2022 | 43 | 2022 |
Synthesizing coherent story with auto-regressive latent diffusion models X Pan, P Qin, Y Li, H Xue, W Chen WACV 2024 (Oral), 2920--2930, 2022 | 38 | 2022 |
Kosmos-g: Generating images in context with multimodal large language models X Pan, L Dong, S Huang, Z Peng, W Chen, F Wei ICLR 2024, 2023 | 32 | 2023 |
Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs S Tong, E Brown, P Wu, S Woo, M Middepogu, SC Akula, J Yang, S Yang, ... arXiv preprint arXiv:2406.16860, 2024 | 11 | 2024 |
Image Sculpting: Precise Object Editing with 3D Geometry Control J Yenphraphai, X Pan, S Liu, D Panozzo, S Xie CVPR 2024, 2024 | 5 | 2024 |