C3LLM: Conditional Multimodal Content Generation Using Large Language Models

Z Wang, Q Duan, YW Tai, CK Tang - arXiv preprint arXiv:2405.16136, 2024 - arxiv.org
We introduce C3LLM (Conditioned-on-Three-Modalities Large Language Models), a novel
framework combining three tasks of video-to-audio, audio-to-text, and text-to-audio together …

C3LLM: Conditional Multimodal Content Generation Using Large Language Models

Z Wang, Q Duan, YW Tai, CK Tang - arXiv e-prints, 2024 - ui.adsabs.harvard.edu
Abstract We introduce C3LLM (Conditioned-on-Three-Modalities Large Language Models),
a novel framework combining three tasks of video-to-audio, audio-to-text, and text-to-audio …