A Survey of Multimodal-Guided Image Editing with Text-to-Image Diffusion Models X Shuai, H Ding, X Ma, R Tu, YG Jiang, D Tao arXiv preprint arXiv:2406.14555, 2024 | | 2024 |
Extracting Training Data from Unconditional Diffusion Models Y Chen, X Ma, D Zou, YG Jiang arXiv preprint arXiv:2406.12752, 2024 | | 2024 |
OmniTokenizer: A Joint Image-Video Tokenizer for Visual Generation J Wang, Y Jiang, Z Yuan, B Peng, Z Wu, YG Jiang arXiv preprint arXiv:2406.09399, 2024 | | 2024 |
AID: Adapting Image2Video Diffusion Models for Instruction-guided Video Prediction Z Xing, Q Dai, Z Weng, Z Wu, YG Jiang arXiv preprint arXiv:2406.06465, 2024 | | 2024 |
MotionFollower: Editing Video Motion via Lightweight Score-Guided Diffusion S Tu, Q Dai, Z Zhang, S Xie, ZQ Cheng, C Luo, X Han, Z Wu, YG Jiang arXiv preprint arXiv:2405.20325, 2024 | | 2024 |
MOSS: An Open Conversational Large Language Model T Sun, X Zhang, Z He, P Li, Q Cheng, X Liu, H Yan, Y Shao, Q Tang, ... Machine Intelligence Research, 1-18, 2024 | 2 | 2024 |
FedCAda: Adaptive Client-Side Optimization for Accelerated and Stable Federated Learning L Zhou, Y He, K Zhai, X Liu, S Liu, X Ma, G Ye, YG Jiang, H Chai arXiv preprint arXiv:2405.11811, 2024 | | 2024 |
Imbalanced gradients: a subtle cause of overestimated adversarial robustness X Ma, L Jiang, H Huang, Z Weng, J Bailey, YG Jiang Machine Learning 113 (5), 2301-2326, 2024 | 4 | 2024 |
PoseAnimate: Zero-shot high fidelity pose controllable character animation B Zhu, F Wang, T Lu, P Liu, J Su, J Liu, Y Zhang, Z Wu, YG Jiang, GJ Qi arXiv preprint arXiv:2404.13680, 2024 | 1 | 2024 |
Eyes Can Deceive: Benchmarking Counterfactual Reasoning Abilities of Multi-modal Large Language Models Y Li, W Tian, Y Jiao, J Chen, YG Jiang arXiv preprint arXiv:2404.12966, 2024 | | 2024 |
The Dog Walking Theory: Rethinking Convergence in Federated Learning K Zhai, Y Gao, X Ma, D Zou, G Ye, YG Jiang arXiv preprint arXiv:2404.11888, 2024 | | 2024 |
LRANet: Towards Accurate and Efficient Scene Text Detection with Low-Rank Approximation Network Y Su, Z Chen, Z Shao, Y Du, Z Ji, J Bai, Y Zhou, YG Jiang Proceedings of the AAAI Conference on Artificial Intelligence 38 (5), 4979-4987, 2024 | | 2024 |
Instance-aware multi-camera 3d object detection with structural priors mining and self-boosting learning Y Jiao, Z Jie, S Chen, L Cheng, J Chen, L Ma, YG Jiang Proceedings of the AAAI Conference on Artificial Intelligence 38 (3), 2598-2606, 2024 | 3 | 2024 |
Nuscenes-qa: A multi-modal visual question answering benchmark for autonomous driving scenario T Qian, J Chen, L Zhuo, Y Jiao, YG Jiang Proceedings of the AAAI Conference on Artificial Intelligence 38 (5), 4542-4550, 2024 | 48 | 2024 |
Fdgaussian: Fast gaussian splatting from single image via geometric-aware diffusion model Q Feng, Z Xing, Z Wu, YG Jiang arXiv preprint arXiv:2403.10242, 2024 | 3 | 2024 |
Lumen: Unleashing Versatile Vision-Centric Capabilities of Large Multimodal Models Y Jiao, S Chen, Z Jie, J Chen, L Ma, YG Jiang arXiv preprint arXiv:2403.07304, 2024 | 2 | 2024 |
Learning from Rich Semantics and Coarse Locations for Long-tailed Object Detection L Meng, X Dai, J Yang, D Chen, Y Chen, M Liu, YL Chen, Z Wu, L Yuan, ... Advances in Neural Information Processing Systems 36, 2024 | 3 | 2024 |
Multi-prompt alignment for multi-source unsupervised domain adaptation H Chen, X Han, Z Wu, YG Jiang Advances in Neural Information Processing Systems 36, 2024 | 11 | 2024 |
Cdistnet: Perceiving multi-domain character distance for robust text recognition T Zheng, Z Chen, S Fang, H Xie, YG Jiang International Journal of Computer Vision 132 (2), 300-318, 2024 | 45 | 2024 |
Instruction-Guided Scene Text Recognition Y Du, Z Chen, Y Su, C Jia, YG Jiang arXiv preprint arXiv:2401.17851, 2024 | | 2024 |