KS Kalyan - Natural Language Processing Journal, 2023 - Elsevier
Large language models (LLMs) are a special class of pretrained language models (PLMs) obtained by scaling model size, pretraining corpus and computation. LLMs, because of their …
Multimodal Large Language Model (MLLM) recently has been a new rising research hotspot, which uses powerful Large Language Models (LLMs) as a brain to perform …
Neural compression is the application of neural networks and other machine learning methods to data compression. Recent advances in statistical machine learning have opened …
We present Unified-IO 2 a multimodal and multi-skill unified model capable of following novel instructions. Unified-IO 2 can use text images audio and/or videos as input and can …
The advancement of audio-language (AL) multimodal learning tasks has been significant in recent years, yet the limited size of existing audio-language datasets poses challenges for …
Multimodal large language models (MLLMs) have gained significant attention due to their strong multimodal understanding capability. However existing works rely heavily on modality …
This paper presents PaLI-3, a smaller, faster, and stronger vision language model (VLM) that compares favorably to similar models that are 10x larger. As part of arriving at this strong …
C Li - arXiv preprint arXiv:2306.14895, 2023 - arxiv.org
This tutorial note summarizes the presentation on``Large Multimodal Models: Towards Building and Surpassing Multimodal GPT-4'', a part of CVPR 2023 tutorial on``Recent …
In the rapidly evolving field of machine learning (ML), data augmentation (DA) has emerged as a pivotal technique for enhancing model performance by diversifying training examples …