关注
Alex Hauptmann
Alex Hauptmann
在 cs.cmu.edu 的电子邮件经过验证 - 首页
标题
引用次数
年份
MetaDesigner: Advancing Artistic Typography through AI-Driven, User-Centric, and Multilingual WordArt Synthesis
JY He, ZQ Cheng, C Li, J Sun, Q He, W Xiang, H Chen, JP Lan, X Lin, ...
arXiv preprint arXiv:2406.19859, 2024
2024
Human-Aware Vision-and-Language Navigation: Bridging Simulation to Reality with Dynamic Human Interactions
M Li, H Li, ZQ Cheng, Y Dong, Y Zhou, JY He, Q Dai, T Mitamura, ...
arXiv preprint arXiv:2406.19236, 2024
2024
Emotion-LLaMA: Multimodal Emotion Recognition and Reasoning with Instruction Tuning
Z Cheng, ZQ Cheng, JY He, J Sun, K Wang, Y Lin, Z Lian, X Peng, ...
arXiv preprint arXiv:2406.11161, 2024
2024
Multi-modal knowledge distillation for domain-adaptive action recognition
X Zhu, W Liu, CM de Mello, A Hauptmann
Synthetic Data for Artificial Intelligence and Machine Learning: Tools …, 2024
2024
Visual Grounding for User Interfaces
Y Qian, Y Lu, AG Hauptmann, O Riva
Proceedings of the 2024 Conference of the North American Chapter of the …, 2024
2024
Learning Visual-Semantic Subspace Representations for Propositional Reasoning
G Moreira, A Hauptmann, M Marques, JP Costeira
arXiv preprint arXiv:2405.16213, 2024
2024
Mm-tts: A unified framework for multimodal, prompt-induced emotional text-to-speech synthesis
X Li, ZQ Cheng, JY He, X Peng, AG Hauptmann
arXiv preprint arXiv:2404.18398, 2024
12024
PhISANet: Phonetically Informed Speech Animation Network
S Medina, SL Taylor, C Stoll, G Edwards, A Hauptmann, S Watanabe, ...
ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and …, 2024
2024
Direct Preference Optimization of Video Large Multimodal Models from Language Model Reward
R Zhang, L Gui, Z Sun, Y Feng, K Xu, Y Zhang, D Fu, C Li, A Hauptmann, ...
arXiv preprint arXiv:2404.01258, 2024
2024
VICAN: Very Efficient Calibration Algorithm for Large Camera Networks
G Moreira, M Marques, JP Costeira, A Hauptmann
arXiv preprint arXiv:2405.10952, 2024
2024
Adversarially masked video consistency for unsupervised domain adaptation
X Zhu, J Liang, PY Huang, A Hauptmann
arXiv preprint arXiv:2403.16242, 2024
12024
Spae: Semantic pyramid autoencoder for multimodal generation with frozen llms
L Yu, Y Cheng, Z Wang, V Kumar, W Macherey, Y Huang, D Ross, I Essa, ...
Advances in Neural Information Processing Systems 36, 2024
222024
Hyperbolic vs Euclidean embeddings in few-shot learning: Two sides of the same coin
G Moreira, M Marques, JP Costeira, A Hauptmann
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer …, 2024
32024
Documentnet: Bridging the data gap in document pre-training
L Yu, J Miao, X Sun, J Chen, AG Hauptmann, H Dai, W Wei
Proceedings of the 2023 Conference on Empirical Methods in Natural Language …, 2023
22023
Language Model Beats Diffusion--Tokenizer is Key to Visual Generation
L Yu, J Lezama, NB Gundavarapu, L Versari, K Sohn, D Minnen, Y Cheng, ...
arXiv preprint arXiv:2310.05737, 2023
482023
Zero-shot and few-shot stance detection on varied topics via conditional generation
H Wen, AG Hauptmann
Proceedings of the 61st Annual Meeting of the Association for Computational …, 2023
112023
Towards open-domain twitter user profile inference
H Wen, Z Xiao, E Hovy, AG Hauptmann
Findings of the Association for Computational Linguistics: ACL 2023, 3172-3188, 2023
12023
Robust automatic detection of traffic activity
A Hauptmann, L Yu, W Liu, Y Qian, Z Cheng, L Gui
Mobility21, Carnegie Mellon University, 2023
22023
Document Entity Retrieval with Massive and Noisy Pre-training
L Yu, J Miao, X Sun, J Chen, AG Hauptmann, H Dai, W Wei
arXiv preprint arXiv:2306.08937, 2023
2023
Leveraging body pose estimation for gesture recognition in human-robot interaction using synthetic data
X Zhu, CM de Melo, A Hauptmann
Synthetic Data for Artificial Intelligence and Machine Learning: Tools …, 2023
22023
系统目前无法执行此操作,请稍后再试。
文章 1–20