PhyBench: A Physical Commonsense Benchmark for Evaluating Text-to-Image Models F Meng, W Shao, L Luo, Y Wang, Y Chen, Q Lu, Y Yang, T Yang, K Zhang, ... arXiv preprint arXiv:2406.11802, 2024 | | 2024 |
Rethinking Human Evaluation Protocol for Text-to-Video Models: Enhancing Reliability, Reproducibility, and Practicality T Zhang, L Ma, Y Yan, Y Zhang, K Wang, Y Yang, Z Guo, W Shao, Y You, ... arXiv preprint arXiv:2406.08845, 2024 | | 2024 |
VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks J Wu, M Zhong, S Xing, Z Lai, Z Liu, W Wang, Z Chen, X Zhu, L Lu, T Lu, ... arXiv preprint arXiv:2406.08394, 2024 | | 2024 |
GUI Odyssey: A Comprehensive Dataset for Cross-App GUI Navigation on Mobile Devices Q Lu, W Shao, Z Liu, F Meng, B Li, B Chen, S Huang, K Zhang, Y Qiao, ... arXiv preprint arXiv:2406.08451, 2024 | | 2024 |
Needle In A Multimodal Haystack W Wang, S Zhang, Y Ren, Y Duan, T Li, S Liu, M Hu, Z Chen, K Zhang, ... arXiv preprint arXiv:2406.07230, 2024 | | 2024 |
Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation P Sun, Y Jiang, S Chen, S Zhang, B Peng, P Luo, Z Yuan arXiv preprint arXiv:2406.06525, 2024 | 1 | 2024 |
Learning Manipulation by Predicting Interaction J Zeng, Q Bu, B Wang, W Xia, L Chen, H Dong, H Song, D Wang, D Hu, ... arXiv preprint arXiv:2406.00439, 2024 | | 2024 |
Part123: Part-aware 3D Reconstruction from a Single-view Image A Liu, C Lin, Y Liu, X Long, Z Dou, HX Guo, P Luo, W Wang arXiv preprint arXiv:2405.16888, 2024 | | 2024 |
Diagnosing the Compositional Knowledge of Vision Language Models from a Game-Theoretic View J Wang, S Dong, Y Zhu, K Yao, W Zhao, C Li, P Luo arXiv preprint arXiv:2405.17201, 2024 | | 2024 |
AnalogCoder: Analog Circuit Design via Training-Free Code Generation Y Lai, S Lee, G Chen, S Poddar, M Hu, DZ Pan, P Luo arXiv preprint arXiv:2405.14918, 2024 | 1 | 2024 |
UDKAG: Augmenting Large Vision-Language Models with Up-to-Date Knowledge C Li, Z Li, C Jing, S Liu, W Shao, Y Wu, P Luo, Y Qiao, K Zhang arXiv preprint arXiv:2405.14554, 2024 | | 2024 |
Enhance sample efficiency and robustness of end-to-end urban autonomous driving via semantic masked world model Z Gao, Y Mu, C Chen, J Duan, P Luo, Y Lu, SE Li IEEE Transactions on Intelligent Transportation Systems, 2024 | 14 | 2024 |
Prototypical Context-Aware Dynamics for Generalization in Visual Control With Model-Based Reinforcement Learning J Wang, Q Zhang, Y Mu, D Li, D Zhao, Y Zhuang, P Luo, B Wang, J Hao IEEE Transactions on Industrial Informatics, 2024 | 1 | 2024 |
KET-QA: A Dataset for Knowledge Enhanced Table Question Answering M Hu, H Dong, P Luo, S Han, D Zhang arXiv preprint arXiv:2405.08099, 2024 | | 2024 |
Plot2Code: A Comprehensive Benchmark for Evaluating Multi-modal Large Language Models in Code Generation from Scientific Plots C Wu, Y Ge, Q Guo, J Wang, Z Liang, Z Lu, Y Shan, P Luo arXiv preprint arXiv:2405.07990, 2024 | | 2024 |
Scalable and Effective Arithmetic Tree Generation for Adder and Multiplier Designs Y Lai, J Liu, DZ Pan, P Luo arXiv preprint arXiv:2405.06758, 2024 | 1 | 2024 |
UniFS: Universal Few-shot Instance Perception with Point Representations S Jin, R Yao, L Xu, W Liu, C Qian, J Wu, P Luo arXiv preprint arXiv:2404.19401, 2024 | | 2024 |
Mmt-bench: A comprehensive multimodal benchmark for evaluating large vision-language models towards multitask agi K Ying, F Meng, J Wang, Z Li, H Lin, Y Yang, H Zhang, W Zhang, Y Lin, ... arXiv preprint arXiv:2404.16006, 2024 | 8 | 2024 |
Align, Adapt and Inject: Audio-Guided Image Generation, Editing and Stylization Y Yang, K Zhang, Y Ge, W Shao, Z Xue, Y Qiao, P Luo ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and …, 2024 | | 2024 |
Adapting LLaMA Decoder to Vision Transformer J Wang, W Shao, M Chen, C Wu, Y Liu, K Zhang, S Zhang, K Chen, P Luo arXiv preprint arXiv:2404.06773, 2024 | | 2024 |