Mixtral of experts

A Dubey, A Jauhri, A Pandey, A Kadian… - arXiv preprint arXiv …, 2024 - arxiv.org

Modern artificial intelligence (AI) systems are powered by foundation models. This paper
presents a new set of foundation models, called Llama 3. It is a herd of language models …

被引用次数：220 相关文章所有 3 个版本

[PDF] arxiv.org

Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

M Reid, N Savinov, D Teplyashin, D Lepikhin… - arXiv preprint arXiv …, 2024 - arxiv.org

In this report, we present the latest model of the Gemini family, Gemini 1.5 Pro, a highly
compute-efficient multimodal mixture-of-experts model capable of recalling and reasoning …

被引用次数：358 相关文章所有 4 个版本

[PDF] arxiv.org

Adan: Adaptive nesterov momentum algorithm for faster optimizing deep models

X Xie, P Zhou, H Li, Z Lin, S Yan - IEEE Transactions on Pattern …, 2024 - ieeexplore.ieee.org

In deep learning, different kinds of deep networks typically need different optimizers, which
have to be chosen after multiple trials, making the training process inefficient. To relieve this …

被引用次数：107 相关文章所有 4 个版本

[PDF] arxiv.org

Internlm-xcomposer2: Mastering free-form text-image composition and comprehension in vision-language large model

X Dong, P Zhang, Y Zang, Y Cao, B Wang… - arXiv preprint arXiv …, 2024 - arxiv.org

We introduce InternLM-XComposer2, a cutting-edge vision-language model excelling in free-
form text-image composition and comprehension. This model goes beyond conventional …

被引用次数：110 相关文章所有 3 个版本

[PDF] arxiv.org

Datasets for large language models: A comprehensive survey

Y Liu, J Cao, C Liu, K Ding, L Jin - arXiv preprint arXiv:2402.18041, 2024 - arxiv.org

This paper embarks on an exploration into the Large Language Model (LLM) datasets,
which play a crucial role in the remarkable advancements of LLMs. The datasets serve as …

被引用次数：23 相关文章所有 4 个版本

[PDF] arxiv.org

Mini-gemini: Mining the potential of multi-modality vision language models

Y Li, Y Zhang, C Wang, Z Zhong, Y Chen… - arXiv preprint arXiv …, 2024 - arxiv.org

In this work, we introduce Mini-Gemini, a simple and effective framework enhancing multi-
modality Vision Language Models (VLMs). Despite the advancements in VLMs facilitating …

被引用次数：77 相关文章所有 3 个版本

[PDF] arxiv.org

The (r) evolution of multimodal large language models: A survey

D Caffagni, F Cocchi, L Barsellotti, N Moratelli… - arXiv preprint arXiv …, 2024 - arxiv.org

Connecting text and visual modalities plays an essential role in generative intelligence. For
this reason, inspired by the success of large language models, significant research efforts …

被引用次数：15 相关文章所有 4 个版本

[PDF] arxiv.org

Olmo: Accelerating the science of language models

D Groeneveld, I Beltagy, P Walsh, A Bhagia… - arXiv preprint arXiv …, 2024 - arxiv.org

Language models (LMs) have become ubiquitous in both NLP research and in commercial
product offerings. As their commercial importance has surged, the most powerful models …

被引用次数：43 相关文章所有 2 个版本

[PDF] arxiv.org

Are We on the Right Way for Evaluating Large Vision-Language Models?

L Chen, J Li, X Dong, P Zhang, Y Zang, Z Chen… - arXiv preprint arXiv …, 2024 - arxiv.org

Large vision-language models (LVLMs) have recently achieved rapid progress, sparking
numerous studies to evaluate their multi-modal capabilities. However, we dig into current …

被引用次数：55 相关文章所有 3 个版本

[PDF] arxiv.org

Visualwebarena: Evaluating multimodal agents on realistic visual web tasks

JY Koh, R Lo, L Jang, V Duvvur, MC Lim… - arXiv preprint arXiv …, 2024 - arxiv.org

Autonomous agents capable of planning, reasoning, and executing actions on the web offer
a promising avenue for automating computer tasks. However, the majority of existing …

被引用次数：50 相关文章所有 3 个版本

高级搜索

QQ 群

The llama 3 herd of models

Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

Adan: Adaptive nesterov momentum algorithm for faster optimizing deep models

Internlm-xcomposer2: Mastering free-form text-image composition and comprehension in vision-language large model

Datasets for large language models: A comprehensive survey

Mini-gemini: Mining the potential of multi-modality vision language models

The (r) evolution of multimodal large language models: A survey

Olmo: Accelerating the science of language models

Are We on the Right Way for Evaluating Large Vision-Language Models?

Visualwebarena: Evaluating multimodal agents on realistic visual web tasks

引用