Chatbridge: Bridging modalities with large language model as a language catalyst

H Naveed, AU Khan, S Qiu, M Saqib, S Anwar… - arXiv preprint arXiv …, 2023 - arxiv.org

Large Language Models (LLMs) have recently demonstrated remarkable capabilities in
natural language processing tasks and beyond. This success of LLMs has led to a large …

被引用次数：246 相关文章所有 3 个版本

[HTML] sciencedirect.com

[HTML][HTML] A survey of GPT-3 family large language models including ChatGPT and GPT-4

KS Kalyan - Natural Language Processing Journal, 2023 - Elsevier

Large language models (LLMs) are a special class of pretrained language models (PLMs)
obtained by scaling model size, pretraining corpus and computation. LLMs, because of their …

被引用次数：80 相关文章所有 5 个版本

[PDF] arxiv.org

A survey on multimodal large language models

S Yin, C Fu, S Zhao, K Li, X Sun, T Xu… - arXiv preprint arXiv …, 2023 - arxiv.org

Multimodal Large Language Model (MLLM) recently has been a new rising research
hotspot, which uses powerful Large Language Models (LLMs) as a brain to perform …

被引用次数：555 相关文章所有 6 个版本

[PDF] nowpublishers.com

Multimodal foundation models: From specialists to general-purpose assistants

C Li, Z Gan, Z Yang, J Yang, L Li… - … and Trends® in …, 2024 - nowpublishers.com

Neural compression is the application of neural networks and other machine learning
methods to data compression. Recent advances in statistical machine learning have opened …

被引用次数：110 相关文章所有 6 个版本

[PDF] thecvf.com

Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision Language Audio and Action

J Lu, C Clark, S Lee, Z Zhang… - Proceedings of the …, 2024 - openaccess.thecvf.com

We present Unified-IO 2 a multimodal and multi-skill unified model capable of following
novel instructions. Unified-IO 2 can use text images audio and/or videos as input and can …

被引用次数：43 相关文章所有 3 个版本

[PDF] arxiv.org

Wavcaps: A chatgpt-assisted weakly-labelled audio captioning dataset for audio-language multimodal research

X Mei, C Meng, H Liu, Q Kong, T Ko… - … on Audio, Speech …, 2024 - ieeexplore.ieee.org

The advancement of audio-language (AL) multimodal learning tasks has been significant in
recent years, yet the limited size of existing audio-language datasets poses challenges for …

被引用次数：79 相关文章所有 3 个版本

[PDF] thecvf.com

Onellm: One framework to align all modalities with language

J Han, K Gong, Y Zhang, J Wang… - Proceedings of the …, 2024 - openaccess.thecvf.com

Multimodal large language models (MLLMs) have gained significant attention due to their
strong multimodal understanding capability. However existing works rely heavily on modality …

被引用次数：24 相关文章所有 3 个版本

[PDF] arxiv.org

Pali-3 vision language models: Smaller, faster, stronger

X Chen, X Wang, L Beyer, A Kolesnikov, J Wu… - arXiv preprint arXiv …, 2023 - arxiv.org

This paper presents PaLI-3, a smaller, faster, and stronger vision language model (VLM) that
compares favorably to similar models that are 10x larger. As part of arriving at this strong …

被引用次数：43 相关文章所有 3 个版本

[PDF] arxiv.org

Large multimodal models: Notes on cvpr 2023 tutorial

C Li - arXiv preprint arXiv:2306.14895, 2023 - arxiv.org

This tutorial note summarizes the presentation on``Large Multimodal Models: Towards
Building and Surpassing Multimodal GPT-4'', a part of CVPR 2023 tutorial on``Recent …

被引用次数：20 相关文章所有 2 个版本

[PDF] arxiv.org

Data augmentation using llms: Data perspectives, learning paradigms and challenges

B Ding, C Qin, R Zhao, T Luo, X Li, G Chen… - arXiv preprint arXiv …, 2024 - arxiv.org

In the rapidly evolving field of machine learning (ML), data augmentation (DA) has emerged
as a pivotal technique for enhancing model performance by diversifying training examples …

被引用次数：18 相关文章所有 3 个版本

高级搜索

QQ 群