Prompt, plan, perform: Llm-based humanoid control via quantized imitation learning

J Li, M Zhang, N Li, D Weyns, Z Jin, K Tei - ACM Transactions on …, 2024 - dl.acm.org

Self-adaptive systems (SASs) are designed to handle changes and uncertainties through a
feedback loop with four core functionalities: monitoring, analyzing, planning, and execution …

被引用次数：8 相关文章所有 3 个版本

[HTML] mdpi.com

[HTML][HTML] A survey of robot intelligence with large language models

H Jeong, H Lee, C Kim, S Shin - Applied Sciences, 2024 - mdpi.com

Since the emergence of ChatGPT, research on large language models (LLMs) has actively
progressed across various fields. LLMs, pre-trained on vast text datasets, have exhibited …

被引用次数：7 相关文章所有 2 个版本

[PDF] ieee.org

Gpt-4v (ision) for robotics: Multimodal task planning from human demonstration

N Wake, A Kanehira, K Sasabuchi… - IEEE Robotics and …, 2024 - ieeexplore.ieee.org

We introduce a pipeline that enhances a general-purpose Vision Language Model, GPT-4V
(ision), to facilitate one-shot visual teaching for robotic manipulation. This system analyzes …

被引用次数：51 相关文章所有 2 个版本

Diffusiondepth: Diffusion denoising approach for monocular depth estimation

Y Duan, X Guo, Z Zhu - European Conference on Computer Vision, 2024 - Springer

Monocular depth estimation is a challenging task that predicts the pixel-wise depth from a
single 2D image. Current methods typically model this problem as a regression or …

被引用次数：55 相关文章所有 2 个版本

[PDF] arxiv.org

An interactive agent foundation model

Z Durante, B Sarkar, R Gong, R Taori, Y Noda… - arXiv preprint arXiv …, 2024 - arxiv.org

The development of artificial intelligence systems is transitioning from creating static, task-
specific models to dynamic, agent-based systems capable of performing well in a wide …

被引用次数：17 相关文章所有 3 个版本

[PDF] arxiv.org

LLM-empowered state representation for reinforcement learning

B Wang, Y Qu, Y Jiang, J Shao, C Liu, W Yang… - arXiv preprint arXiv …, 2024 - arxiv.org

Conventional state representations in reinforcement learning often omit critical task-related
details, presenting a significant challenge for value networks in establishing accurate …

被引用次数：5 相关文章

[PDF] arxiv.org

Superpadl: Scaling language-directed physics-based control with progressive supervised distillation

J Juravsky, Y Guo, S Fidler, XB Peng - ACM SIGGRAPH 2024 …, 2024 - dl.acm.org

Physically-simulated models for human motion can generate high-quality responsive
character animations, often in real-time. Natural language serves as a flexible interface for …

被引用次数：3 相关文章所有 2 个版本

[PDF] pkwyx.com

Generating Physically Realistic and Directable Human Motions from Multi-modal Inputs

A Shrestha, P Liu, G Ros, K Yuan, A Fern - European Conference on …, 2024 - Springer

This work focuses on generating realistic, physically-based human behaviors from multi-
modal inputs, which may only partially specify the desired motion. For example, the input …

被引用次数：1 相关文章所有 5 个版本

[PDF] arxiv.org

Humanoid Locomotion and Manipulation: Current Progress and Challenges in Control, Planning, and Learning

Z Gu, J Li, W Shen, W Yu, Z Xie, S McCrory… - arXiv preprint arXiv …, 2025 - arxiv.org

Humanoid robots have great potential to perform various human-level skills. These skills
involve locomotion, manipulation, and cognitive capabilities. Driven by advances in machine …

被引用次数：1 相关文章所有 2 个版本

[PDF] arxiv.org

Integrating reinforcement learning with foundation models for autonomous robotics: Methods and perspectives

A Moroncelli, V Soni, AA Shahid, M Maccarini… - arXiv preprint arXiv …, 2024 - arxiv.org

Foundation models (FMs), large deep learning models pre-trained on vast, unlabeled
datasets, exhibit powerful capabilities in understanding complex patterns and generating …

被引用次数：1 相关文章

高级搜索

QQ 群