Towards end-to-end embodied decision making via multi-modal large language model: Explorations with gpt4-vision and beyond L Chen, Y Zhang, S Ren, H Zhao, Z Cai, Y Wang, P Wang, T Liu, B Chang FMDM@NeurIPS 2023, 2023 | 28 | 2023 |
Uniedit: A unified tuning-free framework for video motion and appearance editing J Bai, T He, Y Wang, J Guo, H Hu, Z Liu, J Bian arXiv preprint arXiv:2402.13185, 2024 | 19 | 2024 |
PCA-Bench: Evaluating Multimodal Large Language Models in Perception-Cognition-Action Chain L Chen, Y Zhang, S Ren, H Zhao, Z Cai, Y Wang, P Wang, X Meng, T Liu, ... ACL 2024 Findings, 2024 | 16 | 2024 |
GAIA: Zero-shot Talking Avatar Generation T He*, J Guo*, R Yu*, Y Wang*, J Zhu, K An, L Li, X Tan, C Wang, H Hu, ... ICLR 2024, 2023 | 9 | 2023 |
InstructAvatar: Text-Guided Emotion and Motion Control for Avatar Generation Y Wang, J Guo, J Bai, R Yu, T He, X Tan, X Sun, J Bian arXiv preprint arXiv:2405.15758, 2024 | 5 | 2024 |
Make Your Actor Talk: Generalizable and High-Fidelity Lip Sync with Motion and Appearance Disentanglement R Yu, T He, A Zhang, Y Wang, J Guo, X Tan, C Liu, J Chen, J Bian arXiv preprint arXiv:2406.08096, 2024 | 4 | 2024 |
LaDiC: Are Diffusion Models Really Inferior to Autoregressive Counterparts for Image-to-Text Generation? Y Wang, S Ren, R Gao, L Yao, Q Guo, K An, J Bai, X Sun NAACL 2024, 2024 | 3 | 2024 |
Rethinking Semantic Parsing for Large Language Models: Enhancing LLM Performance with Semantic Hints K An, S Si, H Hu, H Zhao, Y Wang, Q Guo, B Chang arXiv preprint arXiv:2409.14469, 2024 | | 2024 |