Artificial Intelligence in Image-based Cardiovascular Disease Analysis: A Comprehensive Survey and Future Outlook

X Wang, H Zhu - arXiv preprint arXiv:2402.03394, 2024 - arxiv.org
Recent advancements in Artificial Intelligence (AI) have significantly influenced the field of
Cardiovascular Disease (CVD) analysis, particularly in image-based diagnostics. Our paper …

Plug-and-play grounding of reasoning in multimodal large language models

J Chen, Y Liu, D Li, X An, Z Feng, Y Zhao… - arXiv preprint arXiv …, 2024 - arxiv.org
The surge of Multimodal Large Language Models (MLLMs), given their prominent emergent
capabilities in instruction following and reasoning, has greatly advanced the field of visual …

Capabilities of large language models in control engineering: A benchmark study on gpt-4, claude 3 opus, and gemini 1.0 ultra

D Kevian, U Syed, X Guo, A Havens, G Dullerud… - arXiv preprint arXiv …, 2024 - arxiv.org
In this paper, we explore the capabilities of state-of-the-art large language models (LLMs)
such as GPT-4, Claude 3 Opus, and Gemini 1.0 Ultra in solving undergraduate-level control …

Unveiling the generalization power of fine-tuned large language models

H Yang, Y Zhang, J Xu, H Lu, PA Heng… - arXiv preprint arXiv …, 2024 - arxiv.org
While Large Language Models (LLMs) have demonstrated exceptional multitasking abilities,
fine-tuning these models on downstream, domain-specific datasets is often necessary to …

DialogGen: Multi-modal Interactive Dialogue System for Multi-turn Text-to-Image Generation

M Huang, Y Long, X Deng, R Chu, J Xiong… - arXiv preprint arXiv …, 2024 - arxiv.org
Text-to-image (T2I) generation models have significantly advanced in recent years.
However, effective interaction with these models is challenging for average users due to the …

MotionLLM: Understanding Human Behaviors from Human Motions and Videos

LH Chen, S Lu, A Zeng, H Zhang, B Wang… - arXiv preprint arXiv …, 2024 - arxiv.org
This study delves into the realm of multi-modality (ie, video and motion modalities) human
behavior understanding by leveraging the powerful capabilities of Large Language Models …

Reasoning on Efficient Knowledge Paths: Knowledge Graph Guides Large Language Model for Domain Question Answering

Y Wang, B Jiang, Y Luo, D He, P Cheng… - arXiv preprint arXiv …, 2024 - arxiv.org
Large language models (LLMs), such as GPT3. 5, GPT4 and LLAMA2 perform surprisingly
well and outperform human experts on many tasks. However, in many domain-specific …

Understanding the planning of LLM agents: A survey

X Huang, W Liu, X Chen, X Wang, H Wang… - arXiv preprint arXiv …, 2024 - arxiv.org
As Large Language Models (LLMs) have shown significant intelligence, the progress to
leverage LLMs as planning modules of autonomous agents has attracted more attention …

Foundation Models for Recommender Systems: A Survey and New Perspectives

C Huang, T Yu, K Xie, S Zhang, L Yao… - arXiv preprint arXiv …, 2024 - arxiv.org
Recently, Foundation Models (FMs), with their extensive knowledge bases and complex
architectures, have offered unique opportunities within the realm of recommender systems …

The Essential Role of Causality in Foundation World Models for Embodied AI

T Gupta, W Gong, C Ma, N Pawlowski, A Hilmkil… - arXiv preprint arXiv …, 2024 - arxiv.org
Recent advances in foundation models, especially in large multi-modal models and
conversational agents, have ignited interest in the potential of generally capable embodied …