Meerkat: Audio-Visual Large Language Model for Grounding in Space and Time

S Chowdhury, S Nag, S Dasgupta, J Chen… - arXiv preprint arXiv …, 2024 - arxiv.org
Leveraging Large Language Models' remarkable proficiency in text-based tasks, recent
works on Multi-modal LLMs (MLLMs) extend them to other modalities like vision and audio …

MoBA: Mixture of Bi-directional Adapter for Multi-modal Sarcasm Detection

Y Xie, Z Zhu, X Chen, Z Chen, Z Huang - ACM Multimedia 2024, 2024 - openreview.net
In the field of multi-modal learning, model parameters are typically large, necessitating the
use of parameter-efficient fine-tuning (PEFT) techniques. These methods have been pivotal …

InMu-Net: Advancing Multi-modal Intent Detection via Information Bottleneck and Multi-sensory Processing

Z Zhu, X Cheng, Z Chen, Y Chen, Y Zhang… - ACM Multimedia …, 2024 - openreview.net
Multi-modal intent detection (MID) aims to comprehend users' intentions through diverse
modalities, which has received widespread attention in dialogue systems. Despite the …

Multimodal seed data augmentation for low-resource audio latin Cuengh languge

L Jiang, J Li, J Zhang, Y Shen - 2024 - researchsquare.com
Abstract The Latin Cuengh Language is a low-resource dialect prevalent in select ethnic
minority regions of China, presents unique challenges for intelligent research and …