Incorporating physics into data-driven computer vision

A Kadambi, C de Melo, CJ Hsieh… - Nature Machine …, 2023 - nature.com
Many computer vision techniques infer properties of our physical world from images.
Although images are formed through the physics of light and mechanics, computer vision …

On the opportunities and risks of foundation models

R Bommasani, DA Hudson, E Adeli, R Altman… - arXiv preprint arXiv …, 2021 - arxiv.org
AI is undergoing a paradigm shift with the rise of models (eg, BERT, DALL-E, GPT-3) that are
trained on broad data at scale and are adaptable to a wide range of downstream tasks. We …

Physically grounded vision-language models for robotic manipulation

J Gao, B Sarkar, F Xia, T Xiao, J Wu… - … on Robotics and …, 2024 - ieeexplore.ieee.org
Recent advances in vision-language models (VLMs) have led to improved performance on
tasks such as visual question answering and image captioning. Consequently, these models …

On the binding problem in artificial neural networks

K Greff, S Van Steenkiste, J Schmidhuber - arXiv preprint arXiv …, 2020 - arxiv.org
Contemporary neural networks still fall short of human-level generalization, which extends
far beyond our direct experiences. In this paper, we argue that the underlying cause for this …

3d neural scene representations for visuomotor control

Y Li, S Li, V Sitzmann, P Agrawal… - Conference on Robot …, 2022 - proceedings.mlr.press
Humans have a strong intuitive understanding of the 3D environment around us. The mental
model of the physics in our brain applies to objects of different materials and enables us to …

Physgen: Rigid-body physics-grounded image-to-video generation

S Liu, Z Ren, S Gupta, S Wang - European Conference on Computer …, 2025 - Springer
We present PhysGen, a novel image-to-video generation method that converts a single
image and an input condition (eg., force and torque applied to an object in the image) to …

Learning multi-object dynamics with compositional neural radiance fields

D Driess, Z Huang, Y Li, R Tedrake… - Conference on robot …, 2023 - proceedings.mlr.press
We present a method to learn compositional multi-object dynamics models from image
observations based on implicit object encoders, Neural Radiance Fields (NeRFs), and …

Algorithms and systems for manipulating multiple objects

Z Pan, A Zeng, Y Li, J Yu… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
Robot manipulation of multiple objects is an important topic for applications including
warehouse automation, service robots performing cleaning, and large-scale object sorting …

Dynamic visual reasoning by learning differentiable physics models from video and language

M Ding, Z Chen, T Du, P Luo… - Advances In Neural …, 2021 - proceedings.neurips.cc
In this work, we propose a unified framework, called Visual Reasoning with Differ-entiable
Physics (VRDP), that can jointly learn visual concepts and infer physics models of objects …

RoboCraft: Learning to see, simulate, and shape elasto-plastic objects in 3D with graph networks

H Shi, H Xu, Z Huang, Y Li… - The International Journal …, 2024 - journals.sagepub.com
Modeling and manipulating elasto-plastic objects are essential capabilities for robots to
perform complex industrial and household interaction tasks (eg, stuffing dumplings, rolling …