MetaDesigner: Advancing Artistic Typography through AI-Driven, User-Centric, and Multilingual WordArt Synthesis JY He, ZQ Cheng, C Li, J Sun, Q He, W Xiang, H Chen, JP Lan, X Lin, ... arXiv preprint arXiv:2406.19859, 2024 | | 2024 |
Human-Aware Vision-and-Language Navigation: Bridging Simulation to Reality with Dynamic Human Interactions M Li, H Li, ZQ Cheng, Y Dong, Y Zhou, JY He, Q Dai, T Mitamura, ... arXiv preprint arXiv:2406.19236, 2024 | | 2024 |
Emotion-LLaMA: Multimodal Emotion Recognition and Reasoning with Instruction Tuning Z Cheng, ZQ Cheng, JY He, J Sun, K Wang, Y Lin, Z Lian, X Peng, ... arXiv preprint arXiv:2406.11161, 2024 | | 2024 |
Multi-modal knowledge distillation for domain-adaptive action recognition X Zhu, W Liu, CM de Mello, A Hauptmann Synthetic Data for Artificial Intelligence and Machine Learning: Tools …, 2024 | | 2024 |
Visual Grounding for User Interfaces Y Qian, Y Lu, AG Hauptmann, O Riva Proceedings of the 2024 Conference of the North American Chapter of the …, 2024 | | 2024 |
Learning Visual-Semantic Subspace Representations for Propositional Reasoning G Moreira, A Hauptmann, M Marques, JP Costeira arXiv preprint arXiv:2405.16213, 2024 | | 2024 |
Mm-tts: A unified framework for multimodal, prompt-induced emotional text-to-speech synthesis X Li, ZQ Cheng, JY He, X Peng, AG Hauptmann arXiv preprint arXiv:2404.18398, 2024 | 1 | 2024 |
PhISANet: Phonetically Informed Speech Animation Network S Medina, SL Taylor, C Stoll, G Edwards, A Hauptmann, S Watanabe, ... ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and …, 2024 | | 2024 |
Direct Preference Optimization of Video Large Multimodal Models from Language Model Reward R Zhang, L Gui, Z Sun, Y Feng, K Xu, Y Zhang, D Fu, C Li, A Hauptmann, ... arXiv preprint arXiv:2404.01258, 2024 | | 2024 |
VICAN: Very Efficient Calibration Algorithm for Large Camera Networks G Moreira, M Marques, JP Costeira, A Hauptmann arXiv preprint arXiv:2405.10952, 2024 | | 2024 |
Adversarially masked video consistency for unsupervised domain adaptation X Zhu, J Liang, PY Huang, A Hauptmann arXiv preprint arXiv:2403.16242, 2024 | 1 | 2024 |
Spae: Semantic pyramid autoencoder for multimodal generation with frozen llms L Yu, Y Cheng, Z Wang, V Kumar, W Macherey, Y Huang, D Ross, I Essa, ... Advances in Neural Information Processing Systems 36, 2024 | 22 | 2024 |
Hyperbolic vs Euclidean embeddings in few-shot learning: Two sides of the same coin G Moreira, M Marques, JP Costeira, A Hauptmann Proceedings of the IEEE/CVF Winter Conference on Applications of Computer …, 2024 | 3 | 2024 |
Documentnet: Bridging the data gap in document pre-training L Yu, J Miao, X Sun, J Chen, AG Hauptmann, H Dai, W Wei Proceedings of the 2023 Conference on Empirical Methods in Natural Language …, 2023 | 2 | 2023 |
Language Model Beats Diffusion--Tokenizer is Key to Visual Generation L Yu, J Lezama, NB Gundavarapu, L Versari, K Sohn, D Minnen, Y Cheng, ... arXiv preprint arXiv:2310.05737, 2023 | 48 | 2023 |
Zero-shot and few-shot stance detection on varied topics via conditional generation H Wen, AG Hauptmann Proceedings of the 61st Annual Meeting of the Association for Computational …, 2023 | 11 | 2023 |
Towards open-domain twitter user profile inference H Wen, Z Xiao, E Hovy, AG Hauptmann Findings of the Association for Computational Linguistics: ACL 2023, 3172-3188, 2023 | 1 | 2023 |
Robust automatic detection of traffic activity A Hauptmann, L Yu, W Liu, Y Qian, Z Cheng, L Gui Mobility21, Carnegie Mellon University, 2023 | 2 | 2023 |
Document Entity Retrieval with Massive and Noisy Pre-training L Yu, J Miao, X Sun, J Chen, AG Hauptmann, H Dai, W Wei arXiv preprint arXiv:2306.08937, 2023 | | 2023 |
Leveraging body pose estimation for gesture recognition in human-robot interaction using synthetic data X Zhu, CM de Melo, A Hauptmann Synthetic Data for Artificial Intelligence and Machine Learning: Tools …, 2023 | 2 | 2023 |