A survey of reasoning with foundation models

Artificial Intelligence in Image-based Cardiovascular Disease Analysis: A Comprehensive Survey and Future Outlook

X Wang, H Zhu - arXiv preprint arXiv:2402.03394, 2024 - arxiv.org

Recent advancements in Artificial Intelligence (AI) have significantly influenced the field of
Cardiovascular Disease (CVD) analysis, particularly in image-based diagnostics. Our paper …

被引用次数：1 相关文章所有 2 个版本

[PDF] arxiv.org

Plug-and-play grounding of reasoning in multimodal large language models

J Chen, Y Liu, D Li, X An, Z Feng, Y Zhao… - arXiv preprint arXiv …, 2024 - arxiv.org

The surge of Multimodal Large Language Models (MLLMs), given their prominent emergent
capabilities in instruction following and reasoning, has greatly advanced the field of visual …

被引用次数：4 相关文章所有 2 个版本

[PDF] arxiv.org

Capabilities of large language models in control engineering: A benchmark study on gpt-4, claude 3 opus, and gemini 1.0 ultra

D Kevian, U Syed, X Guo, A Havens, G Dullerud… - arXiv preprint arXiv …, 2024 - arxiv.org

In this paper, we explore the capabilities of state-of-the-art large language models (LLMs)
such as GPT-4, Claude 3 Opus, and Gemini 1.0 Ultra in solving undergraduate-level control …

被引用次数：3 相关文章所有 2 个版本

[PDF] arxiv.org

Unveiling the generalization power of fine-tuned large language models

H Yang, Y Zhang, J Xu, H Lu, PA Heng… - arXiv preprint arXiv …, 2024 - arxiv.org

While Large Language Models (LLMs) have demonstrated exceptional multitasking abilities,
fine-tuning these models on downstream, domain-specific datasets is often necessary to …

被引用次数：2 相关文章所有 3 个版本

[PDF] arxiv.org

DialogGen: Multi-modal Interactive Dialogue System for Multi-turn Text-to-Image Generation

M Huang, Y Long, X Deng, R Chu, J Xiong… - arXiv preprint arXiv …, 2024 - arxiv.org

Text-to-image (T2I) generation models have significantly advanced in recent years.
However, effective interaction with these models is challenging for average users due to the …

被引用次数：1 相关文章所有 2 个版本

[PDF] arxiv.org

MotionLLM: Understanding Human Behaviors from Human Motions and Videos

LH Chen, S Lu, A Zeng, H Zhang, B Wang… - arXiv preprint arXiv …, 2024 - arxiv.org

This study delves into the realm of multi-modality (ie, video and motion modalities) human
behavior understanding by leveraging the powerful capabilities of Large Language Models …

Reasoning on Efficient Knowledge Paths: Knowledge Graph Guides Large Language Model for Domain Question Answering

Y Wang, B Jiang, Y Luo, D He, P Cheng… - arXiv preprint arXiv …, 2024 - arxiv.org

Large language models (LLMs), such as GPT3. 5, GPT4 and LLAMA2 perform surprisingly
well and outperform human experts on many tasks. However, in many domain-specific …

被引用次数：1 相关文章所有 2 个版本

[PDF] arxiv.org

Understanding the planning of LLM agents: A survey

X Huang, W Liu, X Chen, X Wang, H Wang… - arXiv preprint arXiv …, 2024 - arxiv.org

As Large Language Models (LLMs) have shown significant intelligence, the progress to
leverage LLMs as planning modules of autonomous agents has attracted more attention …

被引用次数：14 相关文章所有 2 个版本

[PDF] arxiv.org

Foundation Models for Recommender Systems: A Survey and New Perspectives

C Huang, T Yu, K Xie, S Zhang, L Yao… - arXiv preprint arXiv …, 2024 - arxiv.org

Recently, Foundation Models (FMs), with their extensive knowledge bases and complex
architectures, have offered unique opportunities within the realm of recommender systems …

被引用次数：5 相关文章所有 2 个版本

[PDF] arxiv.org

The Essential Role of Causality in Foundation World Models for Embodied AI

T Gupta, W Gong, C Ma, N Pawlowski, A Hilmkil… - arXiv preprint arXiv …, 2024 - arxiv.org

Recent advances in foundation models, especially in large multi-modal models and
conversational agents, have ignited interest in the potential of generally capable embodied …

被引用次数：2 相关文章所有 2 个版本

高级搜索

QQ 群