B Wang, Z Gu, G Liang, C Xu, B Zhang, B Shi… - arXiv preprint arXiv …, 2024 - arxiv.org
The paper introduces the UniMER dataset, marking the first study on Mathematical Expression Recognition (MER) targeting complex real-world scenarios. The UniMER …
The rapid evolution of Multimodal Large Language Models (MLLMs) has brought substantial advancements in artificial intelligence, significantly enhancing the capability to understand …
We present JanusFlow, a powerful framework that unifies image understanding and generation in a single model. JanusFlow introduces a minimalist architecture that integrates …
G Khalighinejad, S Scott, O Liu, KL Anderson… - arXiv preprint arXiv …, 2024 - arxiv.org
Multimodal information extraction (MIE) is crucial for scientific literature, where valuable data is often spread across text, figures, and tables. In materials science, extracting structured …
Multimodal learning, a rapidly evolving field in artificial intelligence, seeks to construct more versatile and robust systems by integrating and analyzing diverse types of data, including …
X Tang, T Hu, M Ye, Y Shao, X Yin, S Ouyang… - arXiv preprint arXiv …, 2025 - arxiv.org
Chemical reasoning usually involves complex, multi-step processes that demand precise calculations, where even minor errors can lead to cascading failures. Furthermore, large …
The emergence of large language models (LLMs) has accelerated the development of self- driving laboratories (SDLs) for materials research. Despite their transformative potential …
The ability to organically reason over and with both text and images is a pillar of human intelligence, yet the ability of Multimodal Large Language Models (MLLMs) to perform such …
The emergence of large language models (LLMs) has accelerated the development of self- driving laboratories (SDLs) for materials research. Despite their transformative potential …