Var-clip: Text-to-image generator with visual auto-regressive modeling Q Zhang, X Dai, N Yang, X An, Z Feng, X Ren arXiv preprint arXiv:2408.01181, 2024 | 7 | 2024 |
Croc: Pretraining Large Multimodal Models with Cross-Modal Comprehension Y Xie, K Yang, N Yang, W Deng, X Dai, T Gu, Y Wang, X An, Y Zhao, ... arXiv preprint arXiv:2410.14332, 2024 | 1 | 2024 |
Clip-cid: Efficient clip distillation via cluster-instance discrimination K Yang, T Gu, X An, H Jiang, X Dai, Z Feng, W Cai, J Deng arXiv preprint arXiv:2408.09441, 2024 | 1 | 2024 |
Multi-label cluster discrimination for visual representation learning X An, K Yang, X Dai, Z Feng, J Deng ECCV, 2024 | 1 | 2024 |
High-Fidelity Facial Albedo Estimation via Texture Quantization Z Ran, X Ren, X An, K Yang, X Dai, Z Feng, J Guo, L Zhu, J Deng arXiv preprint arXiv:2406.13149, 2024 | | 2024 |