Internlm2 technical report

Z Chen, W Wang, H Tian, S Ye, Z Gao, E Cui… - arXiv preprint arXiv …, 2024 - arxiv.org

In this report, we introduce InternVL 1.5, an open-source multimodal large language model
(MLLM) to bridge the capability gap between open-source and proprietary commercial …

被引用次数：18 相关文章所有 2 个版本

[PDF] arxiv.org

Internlm-xcomposer2-4khd: A pioneering large vision-language model handling resolutions from 336 pixels to 4k hd

X Dong, P Zhang, Y Zang, Y Cao, B Wang… - arXiv preprint arXiv …, 2024 - arxiv.org

The Large Vision-Language Model (LVLM) field has seen significant advancements, yet its
progression has been hindered by challenges in comprehending fine-grained visual content …

被引用次数：10 相关文章所有 2 个版本

[PDF] arxiv.org

LLM Reasoners: New Evaluation, Library, and Analysis of Step-by-Step Reasoning with Large Language Models

S Hao, Y Gu, H Luo, T Liu, X Shao, X Wang… - arXiv preprint arXiv …, 2024 - arxiv.org

Generating accurate step-by-step reasoning is essential for Large Language Models (LLMs)
to address complex problems and enhance robustness and interpretability. Despite the flux …

被引用次数：1 相关文章所有 4 个版本

[PDF] arxiv.org

A Survey on Large Language Models with Multilingualism: Recent Advances and New Frontiers

K Huang, F Mo, H Li, Y Li, Y Zhang, W Yi, Y Mao… - arXiv preprint arXiv …, 2024 - arxiv.org

The rapid development of Large Language Models (LLMs) demonstrates remarkable
multilingual capabilities in natural language processing, attracting global attention in both …

相关文章所有 2 个版本

[PDF] arxiv.org

TextSquare: Scaling up Text-Centric Visual Instruction Tuning

J Tang, C Lin, Z Zhao, S Wei, B Wu, Q Liu… - arXiv preprint arXiv …, 2024 - arxiv.org

Text-centric visual question answering (VQA) has made great strides with the development
of Multimodal Large Language Models (MLLMs), yet open-source models still fall short of …

被引用次数：1 相关文章所有 2 个版本

[PDF] arxiv.org

Ada-LEval: Evaluating long-context LLMs with length-adaptable benchmarks

C Wang, H Duan, S Zhang, D Lin, K Chen - arXiv preprint arXiv …, 2024 - arxiv.org

Recently, the large language model (LLM) community has shown increasing interest in
enhancing LLMs' capability to handle extremely long documents. As various long-text …

被引用次数：1 相关文章所有 3 个版本

[PDF] arxiv.org

Turbo Sparse: Achieving LLM SOTA Performance with Minimal Activated Parameters

Y Song, H Xie, Z Zhang, B Wen, L Ma, Z Mi… - arXiv preprint arXiv …, 2024 - arxiv.org

Exploiting activation sparsity is a promising approach to significantly accelerating the
inference process of large language models (LLMs) without compromising performance …

被引用次数：1 相关文章所有 2 个版本

[PDF] arxiv.org

Needle In A Multimodal Haystack

W Wang, S Zhang, Y Ren, Y Duan, T Li, S Liu… - arXiv preprint arXiv …, 2024 - arxiv.org

With the rapid advancement of multimodal large language models (MLLMs), their evaluation
has become increasingly comprehensive. However, understanding long multimodal content …

相关文章所有 2 个版本

[PDF] arxiv.org

VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks

J Wu, M Zhong, S Xing, Z Lai, Z Liu, W Wang… - arXiv preprint arXiv …, 2024 - arxiv.org

We present VisionLLM v2, an end-to-end generalist multimodal large model (MLLM) that
unifies visual perception, understanding, and generation within a single framework. Unlike …

相关文章所有 2 个版本

[PDF] arxiv.org

D-CPT Law: Domain-specific Continual Pre-Training Scaling Law for Large Language Models

H Que, J Liu, G Zhang, C Zhang, X Qu, Y Ma… - arXiv preprint arXiv …, 2024 - arxiv.org

Continual Pre-Training (CPT) on Large Language Models (LLMs) has been widely used to
expand the model's fundamental understanding of specific downstream domains (eg, math …

相关文章所有 2 个版本

高级搜索

QQ 群

How far are we to gpt-4v? closing the gap to commercial multimodal models with open-source suites

Internlm-xcomposer2-4khd: A pioneering large vision-language model handling resolutions from 336 pixels to 4k hd

LLM Reasoners: New Evaluation, Library, and Analysis of Step-by-Step Reasoning with Large Language Models

A Survey on Large Language Models with Multilingualism: Recent Advances and New Frontiers

TextSquare: Scaling up Text-Centric Visual Instruction Tuning

Ada-LEval: Evaluating long-context LLMs with length-adaptable benchmarks

Turbo Sparse: Achieving LLM SOTA Performance with Minimal Activated Parameters

Needle In A Multimodal Haystack

VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks

D-CPT Law: Domain-specific Continual Pre-Training Scaling Law for Large Language Models

引用