Syntax-aware network for handwritten mathematical expression recognition

B Li, Y Zhang, D Guo, R Zhang, F Li, H Zhang… - arXiv preprint arXiv …, 2024 - arxiv.org

We present LLaVA-OneVision, a family of open large multimodal models (LMMs) developed
by consolidating our insights into data, models, and visual representations in the LLaVA …

被引用次数：234 相关文章所有 2 个版本

[PDF] arxiv.org

On the hidden mystery of ocr in large multimodal models

Y Liu, Z Li, B Yang, C Li, X Yin, C Liu, L Jin… - arXiv preprint arXiv …, 2023 - arxiv.org

Large models have recently played a dominant role in natural language processing and
multimodal vision-language learning. However, their effectiveness in text-related visual …

被引用次数：166 相关文章所有 2 个版本

Mathematics intelligent tutoring systems with handwritten input: A scoping review

L Rodrigues, FD Pereira, M Marinho, V Macario… - Education and …, 2024 - Springer

Abstract Intelligent Tutoring Systems (ITS) have been widely used to enhance math learning,
wherein teacher's involvement is prominent to achieve their full potential. Usually, ITSs …

被引用次数：8 相关文章所有 2 个版本

[PDF] arxiv.org

When counting meets HMER: counting-aware network for handwritten mathematical expression recognition

B Li, Y Yuan, D Liang, X Liu, Z Ji, J Bai, W Liu… - European conference on …, 2022 - Springer

Recently, most handwritten mathematical expression recognition (HMER) methods adopt
the encoder-decoder networks, which directly predict the markup sequences from formula …

被引用次数：63 相关文章所有 6 个版本

[PDF] arxiv.org

Expanding performance boundaries of open-source multimodal models with model, data, and test-time scaling

Z Chen, W Wang, Y Cao, Y Liu, Z Gao, E Cui… - arXiv preprint arXiv …, 2024 - arxiv.org

We introduce InternVL 2.5, an advanced multimodal large language model (MLLM) series
that builds upon InternVL 2.0, maintaining its core model architecture while introducing …

被引用次数：18 相关文章所有 2 个版本

[PDF] arxiv.org

Mmt-bench: A comprehensive multimodal benchmark for evaluating large vision-language models towards multitask agi

K Ying, F Meng, J Wang, Z Li, H Lin, Y Yang… - arXiv preprint arXiv …, 2024 - arxiv.org

Large Vision-Language Models (LVLMs) show significant strides in general-purpose
multimodal applications such as visual dialogue and embodied navigation. However …

被引用次数：54 相关文章所有 3 个版本

[PDF] arxiv.org

Nvlm: Open frontier-class multimodal llms

W Dai, N Lee, B Wang, Z Yang, Z Liu, J Barker… - arXiv preprint arXiv …, 2024 - arxiv.org

We introduce NVLM 1.0, a family of frontier-class multimodal large language models (LLMs)
that achieve state-of-the-art results on vision-language tasks, rivaling the leading proprietary …

被引用次数：17 相关文章所有 4 个版本

[PDF] arxiv.org

Exploring ocr capabilities of gpt-4v (ision): A quantitative and in-depth evaluation

Y Shi, D Peng, W Liao, Z Lin, X Chen, C Liu… - arXiv preprint arXiv …, 2023 - arxiv.org

This paper presents a comprehensive evaluation of the Optical Character Recognition
(OCR) capabilities of the recently released GPT-4V (ision), a Large Multimodal Model …

被引用次数：52 相关文章所有 3 个版本

[PDF] hal.science

A survey on handwritten mathematical expression recognition: The rise of encoder-decoder and GNN models

TN Truong, CT Nguyen, R Zanibbi, H Mouchère… - Pattern Recognition, 2024 - Elsevier

Recognition of handwritten mathematical expressions (HMEs) has attracted growing interest
due to steady progress in handwriting recognition techniques and the rapid emergence of …

被引用次数：7 相关文章所有 7 个版本

[PDF] arxiv.org

Texthawk2: A large vision-language model excels in bilingual ocr and grounding with 16x fewer tokens

YQ Yu, M Liao, J Zhang, J Wu - arXiv preprint arXiv:2410.05261, 2024 - arxiv.org

Reading dense text and locating objects within images are fundamental abilities for Large
Vision-Language Models (LVLMs) tasked with advanced jobs. Previous LVLMs, including …

被引用次数：7 相关文章所有 2 个版本

高级搜索

QQ 群