H Jeong, H Lee, C Kim, S Shin - Applied Sciences, 2024 - mdpi.com
Since the emergence of ChatGPT, research on large language models (LLMs) has actively progressed across various fields. LLMs, pre-trained on vast text datasets, have exhibited …
N Wake, A Kanehira, K Sasabuchi… - IEEE Robotics and …, 2024 - ieeexplore.ieee.org
We introduce a pipeline that enhances a general-purpose Vision Language Model, GPT-4V (ision), to facilitate one-shot visual teaching for robotic manipulation. This system analyzes …
Y Duan, X Guo, Z Zhu - European Conference on Computer Vision, 2024 - Springer
Monocular depth estimation is a challenging task that predicts the pixel-wise depth from a single 2D image. Current methods typically model this problem as a regression or …
The development of artificial intelligence systems is transitioning from creating static, task- specific models to dynamic, agent-based systems capable of performing well in a wide …
Conventional state representations in reinforcement learning often omit critical task-related details, presenting a significant challenge for value networks in establishing accurate …
Physically-simulated models for human motion can generate high-quality responsive character animations, often in real-time. Natural language serves as a flexible interface for …
This work focuses on generating realistic, physically-based human behaviors from multi- modal inputs, which may only partially specify the desired motion. For example, the input …
Humanoid robots have great potential to perform various human-level skills. These skills involve locomotion, manipulation, and cognitive capabilities. Driven by advances in machine …
Foundation models (FMs), large deep learning models pre-trained on vast, unlabeled datasets, exhibit powerful capabilities in understanding complex patterns and generating …