Scaffolding coordinates to promote vision-language coordination in large multi-modal models

X Lei, Z Yang, X Chen, P Li, Y Liu - arXiv preprint arXiv:2402.12058, 2024 - arxiv.org
State-of-the-art Large Multi-Modal Models (LMMs) have demonstrated exceptional
capabilities in vision-language tasks. Despite their advanced functionalities, the …

Plug-and-play grounding of reasoning in multimodal large language models

J Chen, Y Liu, D Li, X An, Z Feng, Y Zhao… - arXiv preprint arXiv …, 2024 - arxiv.org
The surge of Multimodal Large Language Models (MLLMs), given their prominent emergent
capabilities in instruction following and reasoning, has greatly advanced the field of visual …

Seed-bench-2-plus: Benchmarking multimodal large language models with text-rich visual comprehension

B Li, Y Ge, Y Chen, Y Ge, R Zhang, Y Shan - arXiv preprint arXiv …, 2024 - arxiv.org
Comprehending text-rich visual content is paramount for the practical application of
Multimodal Large Language Models (MLLMs), since text-rich scenarios are ubiquitous in the …

Qalam: A Multimodal LLM for Arabic Optical Character and Handwriting Recognition

G Bhatia, EMB Nagoudi, F Alwajih… - arXiv preprint arXiv …, 2024 - arxiv.org
Arabic Optical Character Recognition (OCR) and Handwriting Recognition (HWR) pose
unique challenges due to the cursive and context-sensitive nature of the Arabic script. This …

Identification of illegal outdoor advertisements based on CLIP fine-tuning and OCR technology

H Zhang, Z Ding, M sharid kayes Dipu, P Lv… - IEEE …, 2024 - ieeexplore.ieee.org
Recognizing unauthorized outdoor advertising is important for a city's visual appeal,
organizational structure, and adherence to regulations. This paper aims to solve the problem …

Multimodal Chain-of-Thought Reasoning via ChatGPT to Protect Children from Age-Inappropriate Apps

C Hu, B Liu, M Yin, Y Zhou, X Li - arXiv preprint arXiv:2407.06309, 2024 - arxiv.org
Mobile applications (Apps) could expose children to inappropriate themes such as sexual
content, violence, and drug use. Maturity rating offers a quick and effective method for …

Can VLM Understand Children's Handwriting? An Analysis on Handwritten Mathematical Equation Recognition

C Pereira Júnior, L Rodrigues, N Costa… - … Conference on Artificial …, 2024 - Springer
Abstract Handwriting Mathematical Expression Recognition has several applications,
including the potential to make Intelligent Tutoring Systems (ITS) more accessible to …