AI is undergoing a paradigm shift with the rise of models (eg, BERT, DALL-E, GPT-3) that are trained on broad data at scale and are adaptable to a wide range of downstream tasks. We …
Recent advances in vision-language models (VLMs) have led to improved performance on tasks such as visual question answering and image captioning. Consequently, these models …
Contemporary neural networks still fall short of human-level generalization, which extends far beyond our direct experiences. In this paper, we argue that the underlying cause for this …
Humans have a strong intuitive understanding of the 3D environment around us. The mental model of the physics in our brain applies to objects of different materials and enables us to …
We present PhysGen, a novel image-to-video generation method that converts a single image and an input condition (eg., force and torque applied to an object in the image) to …
D Driess, Z Huang, Y Li, R Tedrake… - Conference on robot …, 2023 - proceedings.mlr.press
We present a method to learn compositional multi-object dynamics models from image observations based on implicit object encoders, Neural Radiance Fields (NeRFs), and …
Z Pan, A Zeng, Y Li, J Yu… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
Robot manipulation of multiple objects is an important topic for applications including warehouse automation, service robots performing cleaning, and large-scale object sorting …
In this work, we propose a unified framework, called Visual Reasoning with Differ-entiable Physics (VRDP), that can jointly learn visual concepts and infer physics models of objects …
H Shi, H Xu, Z Huang, Y Li… - The International Journal …, 2024 - journals.sagepub.com
Modeling and manipulating elasto-plastic objects are essential capabilities for robots to perform complex industrial and household interaction tasks (eg, stuffing dumplings, rolling …