H Liu, C Li, Q Wu, YJ Lee - Advances in neural information …, 2024 - proceedings.neurips.cc
Instruction tuning large language models (LLMs) using machine-generated instruction- following data has been shown to improve zero-shot capabilities on new tasks, but the idea …
Y Li, C Zhang, G Yu, Z Wang, B Fu, G Lin… - arXiv preprint arXiv …, 2023 - arxiv.org
The remarkable multimodal capabilities demonstrated by OpenAI's GPT-4 have sparked significant interest in the development of multimodal Large Language Models (LLMs). A …
High-quality instructions and responses are essential for the zero-shot performance of large language models on interactive natural language tasks. For interactive vision-language …
Large Vision and Language Models have enabled significant advances in fully supervised and zero-shot vision tasks. These large pre-trained architectures serve as the baseline to …
Abstract We present CoDi-2 a Multimodal Large Language Model (MLLM) for learning in- context interleaved multimodal representations. By aligning modalities with language for …
We present PandaGPT, an approach to emPower large lANguage moDels with visual and Auditory instruction-following capabilities. Our pilot experiments show that PandaGPT can …
Instruction-based image editing improves the controllability and flexibility of image manipulation via natural commands without elaborate descriptions or regional masks …
Recent advancements in text-to-image (T2I) and vision-language-to-image (VL2I) generation have made significant strides. However, the generation from generalized vision …
V Azizi, F Koochaki - arXiv preprint arXiv:2406.00971, 2024 - arxiv.org
Vision-Language Models (VLMs) have recently seen significant advancements through integrating with Large Language Models (LLMs). The VLMs, which process image and text …