Mm-llms: Recent advances in multimodal large language models

D Zhang, Y Yu, C Li, J Dong, D Su, C Chu… - arXiv preprint arXiv …, 2024 - arxiv.org
In the past year, MultiModal Large Language Models (MM-LLMs) have undergone
substantial advancements, augmenting off-the-shelf LLMs to support MM inputs or outputs …

Parameter-efficient fine-tuning for large models: A comprehensive survey

Z Han, C Gao, J Liu, SQ Zhang - arXiv preprint arXiv:2403.14608, 2024 - arxiv.org
Large models represent a groundbreaking advancement in multiple application fields,
enabling remarkable achievements across various tasks. However, their unprecedented …

Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

M Reid, N Savinov, D Teplyashin, D Lepikhin… - arXiv preprint arXiv …, 2024 - arxiv.org
In this report, we present the latest model of the Gemini family, Gemini 1.5 Pro, a highly
compute-efficient multimodal mixture-of-experts model capable of recalling and reasoning …

Multimodal foundation models: From specialists to general-purpose assistants

C Li, Z Gan, Z Yang, J Yang, L Li… - … and Trends® in …, 2024 - nowpublishers.com
Neural compression is the application of neural networks and other machine learning
methods to data compression. Recent advances in statistical machine learning have opened …

Mathvista: Evaluating mathematical reasoning of foundation models in visual contexts

P Lu, H Bansal, T Xia, J Liu, C Li, H Hajishirzi… - arXiv preprint arXiv …, 2023 - arxiv.org
Although Large Language Models (LLMs) and Large Multimodal Models (LMMs) exhibit
impressive skills in various domains, their ability for mathematical reasoning within visual …

Large language models: A survey

S Minaee, T Mikolov, N Nikzad, M Chenaghlu… - arXiv preprint arXiv …, 2024 - arxiv.org
Large Language Models (LLMs) have drawn a lot of attention due to their strong
performance on a wide range of natural language tasks, since the release of ChatGPT in …

Cognitive architectures for language agents

TR Sumers, S Yao, K Narasimhan… - arXiv preprint arXiv …, 2023 - arxiv.org
Recent efforts have incorporated large language models (LLMs) with external resources (eg,
the Internet) or internal control flows (eg, prompt chaining) for tasks requiring grounding or …

Mm1: Methods, analysis & insights from multimodal llm pre-training

B McKinzie, Z Gan, JP Fauconnier, S Dodge… - arXiv preprint arXiv …, 2024 - arxiv.org
In this work, we discuss building performant Multimodal Large Language Models (MLLMs).
In particular, we study the importance of various architecture components and data choices …

Mini-gemini: Mining the potential of multi-modality vision language models

Y Li, Y Zhang, C Wang, Z Zhong, Y Chen… - arXiv preprint arXiv …, 2024 - arxiv.org
In this work, we introduce Mini-Gemini, a simple and effective framework enhancing multi-
modality Vision Language Models (VLMs). Despite the advancements in VLMs facilitating …

[HTML][HTML] Fine-tuning ChatGPT for automatic scoring

E Latif, X Zhai - Computers and Education: Artificial Intelligence, 2024 - Elsevier
This study highlights the potential of fine-tuned ChatGPT (GPT-3.5) for automatically scoring
student written constructed responses using example assessment tasks in science …